Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pointer to reference type #113596

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

jeffreytan81
Copy link
Contributor

We have got customer reporting "v &obj" and "p &obj" reporting different results.
Turns out it only happens for obj that is itself a reference type which "v &obj" reports the address of the reference itself instead of the target object the reference points to. This diverged from C++ semantics.

This PR fixes this issue by returning the address of the dereferenced object if it is reference type.

A new test is added which fails before.

@jeffreytan81 jeffreytan81 marked this pull request as ready for review October 24, 2024 17:47
@llvmbot llvmbot added the lldb label Oct 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2024

@llvm/pr-subscribers-lldb

Author: None (jeffreytan81)

Changes

We have got customer reporting "v &obj" and "p &obj" reporting different results.
Turns out it only happens for obj that is itself a reference type which "v &obj" reports the address of the reference itself instead of the target object the reference points to. This diverged from C++ semantics.

This PR fixes this issue by returning the address of the dereferenced object if it is reference type.

A new test is added which fails before.


Full diff: https://github.com/llvm/llvm-project/pull/113596.diff

3 Files Affected:

  • (modified) lldb/source/Core/ValueObject.cpp (+10)
  • (modified) lldb/test/API/lang/cpp/dereferencing_references/TestCPPDereferencingReferences.py (+21)
  • (modified) lldb/test/API/lang/cpp/dereferencing_references/main.cpp (+2)
diff --git a/lldb/source/Core/ValueObject.cpp b/lldb/source/Core/ValueObject.cpp
index 5b1c171c01f2db..df0393213343fa 100644
--- a/lldb/source/Core/ValueObject.cpp
+++ b/lldb/source/Core/ValueObject.cpp
@@ -2911,6 +2911,16 @@ ValueObjectSP ValueObject::AddressOf(Status &error) {
 
   AddressType address_type = eAddressTypeInvalid;
   const bool scalar_is_load_address = false;
+
+  // For reference type we need to get the address of the object that
+  // it refers to.
+  ValueObjectSP deref_obj;
+  if (GetCompilerType().IsReferenceType()) {
+    deref_obj = Dereference(error);
+    if (error.Fail() || !deref_obj)
+      return ValueObjectSP();
+    return deref_obj->AddressOf(error);
+  }
   addr_t addr = GetAddressOf(scalar_is_load_address, &address_type);
   error.Clear();
   if (addr != LLDB_INVALID_ADDRESS && address_type != eAddressTypeHost) {
diff --git a/lldb/test/API/lang/cpp/dereferencing_references/TestCPPDereferencingReferences.py b/lldb/test/API/lang/cpp/dereferencing_references/TestCPPDereferencingReferences.py
index 938fb1a6edf32c..1374d4e1ec67ab 100644
--- a/lldb/test/API/lang/cpp/dereferencing_references/TestCPPDereferencingReferences.py
+++ b/lldb/test/API/lang/cpp/dereferencing_references/TestCPPDereferencingReferences.py
@@ -25,3 +25,24 @@ def test(self):
         # Typedef to a reference should dereference to the underlying type.
         td_val = self.expect_var_path("td_to_ref_type", type="td_int_ref")
         self.assertEqual(td_val.Dereference().GetType().GetName(), "int")
+
+    def test_take_address_of_reference(self):
+        """Tests taking address of lvalue/rvalue references in lldb works correctly."""
+        self.build()
+        lldbutil.run_to_source_breakpoint(
+            self, "// break here", lldb.SBFileSpec("main.cpp")
+        )
+
+        plref_val_from_code = self.expect_var_path("pl_ref", type="TTT *")
+        plref_val_from_expr_path = self.expect_var_path("&l_ref", type="TTT *")
+        self.assertEqual(
+            plref_val_from_code.GetValueAsAddress(),
+            plref_val_from_expr_path.GetValueAsAddress(),
+        )
+
+        prref_val_from_code = self.expect_var_path("pr_ref", type="TTT *")
+        prref_val_from_expr_path = self.expect_var_path("&r_ref", type="TTT *")
+        self.assertEqual(
+            prref_val_from_code.GetValueAsAddress(),
+            prref_val_from_expr_path.GetValueAsAddress(),
+        )
diff --git a/lldb/test/API/lang/cpp/dereferencing_references/main.cpp b/lldb/test/API/lang/cpp/dereferencing_references/main.cpp
index b64978a9029f81..4ddffd167ddeed 100644
--- a/lldb/test/API/lang/cpp/dereferencing_references/main.cpp
+++ b/lldb/test/API/lang/cpp/dereferencing_references/main.cpp
@@ -9,5 +9,7 @@ int main() {
   // typedef of a reference
   td_int_ref td_to_ref_type = i;
 
+  TTT *pl_ref = &l_ref;
+  TTT *pr_ref = &r_ref;
   return l_ref; // break here
 }

@jeffreytan81 jeffreytan81 force-pushed the fix_pointer_reference branch from 5e84fcf to 65b0349 Compare October 24, 2024 21:07
@jeffreytan81 jeffreytan81 merged commit 25909b8 into llvm:main Oct 25, 2024
7 checks passed
@frobtech frobtech mentioned this pull request Oct 25, 2024
@labath labath requested a review from jimingham October 30, 2024 14:32
@labath
Copy link
Collaborator

labath commented Oct 30, 2024

Let's back up a bit. Before this patch, both the ValueObject class and the "frame var" command treated references essentially as pointers (in the example, xp is a pointer to x and xpr is a reference to xp):

(lldb) v *x x &x
(int) ::x = 47
(int *) &::x = 0x0000555555558010
error: not a pointer or reference type: (int) ::x
(lldb) v *xp xp &xp
(int) *xp = 47
(int *) ::xp = 0x0000555555558010
(int **) &::xp = 0x0000555555558018
(lldb) v *xpr xpr &xpr
(int *) *xpr = 0x0000555555558010
(int *&) ::xpr = 0x0000555555558018: {
  &::xpr = 0x0000555555558010
}
(int *&*) &::xpr = 0x0000555555557dc8

Now, you may or may not agree with that behavior, but at least its consistent: notice how "dereferencing" xpr gives you the value of xp and taking the address of xpr gives you its "address" (in spite of what the C++ standard says, the reference does exist somewhere and that place has an address).

After this patch, we get this:

(lldb) v *xpr xpr &xpr
(int *) *xpr = 0x0000555555558010
(int *&) ::xpr = 0x0000555555558018: {
  &::xpr = 0x0000555555558010
}
(int **) &&::xpr = 0x0000555555558018

I.e. taking the "address of" the reference gives you the same thing as its "value", "dereferencing" the reference still gives you the object that is being referenced (it does not dereference the pointer behind it like you would expect according to the c++ rules), and there's no way to determine the location of the reference.

So, either this patch is incorrect (if you believe that the current behavior was intentional) or incomplete (if you believe that we should follow the c++ rules). Either way, I believe we should revert this and discuss the problem in more detail.

FWIW, I agree with what you're trying to achieve (making "frame var" follow c++ rules for references). I just think this needs to be handled more wholistically. And I'm not entirely sure that the fix belongs in the ValueObject class.(*) I also believe there should be a way to get the "location" of the reference (but that doesn't have to be exposed through "frame var")

(*) I'm not saying it does not belong there either, but I think you could make a stronger case for treating references like "fancy pointers" in the ValueObject class -- i.e. maintaining the status quo there.

@jeffreytan81
Copy link
Contributor Author

I haven't had time to look into your example in details.

But to share high level context about the bug behavior, we have observed cases that, there is real world scenarios using std::vector<T>& object, which data formatters, and "v obj" reports this vector "size=13", but there is another function accepting std::vector<T> * as parameter, so &obj is passed. In both locals window for "&obj" and "v &obj" reports "size=0" which is apparently very confusing to end users.
To make things even worse, in VSCode, if your mouse hover "&obj", it will try to "frame var &obj" instead of "frame var obj" which makes this bug behavior even more prevalent.

@labath
Copy link
Collaborator

labath commented Oct 31, 2024

I can believe that this behavior is undesirable. Like I said, I totally understand why you did this. It's the how I have a problem with. I don't believe the we've sufficiently considered the bigger picture. Why does AddressOf handle references transparently and Dereference does not? Are there any other methods that need to be changed to make the ValueObject interface consistent? Is this even the right level (we're about six layers away from VSCode) to implement this behavior?

If you think about it even the implementation of the patch is a bit strange -- this is the first "address of" operation that's implemented in terms of "dereferencing" (and if we changed "dereference" to be transparent, this implementation would break).

Please don't take this personally. I can certainly see how you could have though you're making an obvious bugfix here. But, in light of what I've just said, I think this needs more discussion (and I think this ought to be reverted until that is done). @jimingham, as the own^Wmaintainter of the ValueObject library, what do you think?

@labath
Copy link
Collaborator

labath commented Oct 31, 2024

Regarding the vector summary example, it looks like that there's still a lot of room for improvement there. This is what I get before this patch:

(lldb) v v vp vr vpr &v &vp &vr &vpr
(std::vector<int>) v = size=2 {
  [0] = 0
  [1] = 0
}
(std::vector<int> *) vp = 0x00007fffffffd7b0 size=2
(std::vector<int> &) vr = size=2: {
  [0] = 0
  [1] = 0
}
(std::vector<int> *&) vpr = size=1: {
  &vpr = 0x00007fffffffd7b0 size=2
}
(std::vector<int> *) &v = 0x00007fffffffd7b0 size=2
(std::vector<int> **) &vp = 0x00007fffffffd7a0 size=1
(std::vector<int> &*) &vr = 0x00007fffffffd790 size=1
(std::vector<int> *&*) &vpr = 0x00007fffffffd788 size=1

And this comes after it:

(lldb) v v vp vr vpr &v &vp &vr &vpr
(std::vector<int>) v = size=2 {
  [0] = 0
  [1] = 0
}
(std::vector<int> *) vp = 0x00007fffffffd7b0 size=2
(std::vector<int> &) vr = size=2: {
  [0] = 0
  [1] = 0
}
(std::vector<int> *&) vpr = size=1: {
  &vpr = 0x00007fffffffd7b0 size=2
}
(std::vector<int> *) &v = 0x00007fffffffd7b0 size=2
(std::vector<int> **) &vp = 0x00007fffffffd7a0 size=1
(std::vector<int> *) &&vr = 0x00007fffffffd7b0 size=2
(std::vector<int> **) &&vpr = 0x00007fffffffd7a0 size=1

The output for &vr changes (for the better, I guess), but vpr, &vp and &vpr are still extremely dubios, so if you want to tackle this issue seriously, then we also need to discuss what should the data formatters/summary providers do when faced with nested pointers and references.

@jimingham
Copy link
Collaborator

jimingham commented Oct 31, 2024

First off, it is not the case that the "ValueObject Path Expression" operators - part of what we're starting to call the DIL (Data Inspection Language) - are supposed to exactly mirror C++ semantics. Since they reflect all the ways that a synthetic child provider might present its children, and since this part of lldb is independent of language - it gets used for C/ObjC/C++/Swift/Rust etc values, not just C++ - I don't think this would be a feasible or particularly useful requirement.

So "be the same as C++ semantics" is not a sufficient justification to change how the path expressions get interpreted in lldb. The better motivation is "what is a useful and consistent syntax for static poking at data objects". The question "where does this reference live" isn't a particularly useful question to ask in C++ code, but it does seem like a useful question to ask when poking around in data. And since we're generally treating & to mean "what is the location of this object in memory" in the DIL, then interpreting the & in this case the way it was before the patch seems more natural.

I agree with Pavel, we should take a step back and actually define what & and * will mean in the DIL, and then implement that.

@jeffreytan81
Copy link
Contributor Author

Technical these make sense. Please let me know how to revert a PR (or feel free to revert for me if I do not have permission).

@vogelsgesang
Copy link
Member

FWIW, codelldb also dereferences pointers and references when displaying values. They provide a user-facing setting for it and implement it in the debug adapter, instead of at the core of LLDB

NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
We have got customer reporting "v &obj" and "p &obj" reporting different
results.
Turns out it only happens for obj that is itself a reference type which
"v &obj" reports the address of the reference itself instead of the
target object the reference points to. This diverged from C++ semantics.

This PR fixes this issue by returning the address of the dereferenced
object if it is reference type.

A new test is added which fails before.

Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
labath added a commit to labath/llvm-project that referenced this pull request Nov 4, 2024
This reverts commit 25909b8 due to
unresolved questions about the behavior of "frame var" and ValueObject
in the presence of references (see the original patch for discussion).
@labath
Copy link
Collaborator

labath commented Nov 4, 2024

Thanks for your understanding. Here's a revert PR (#114831). I'll submit it after the CI runs. Would you like to create an RFC thread to continue this discussion?

labath added a commit that referenced this pull request Nov 5, 2024
This reverts commit 25909b8 due to
unresolved questions about the behavior of "frame var" and ValueObject
in the presence of references (see the original patch for discussion).
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
This reverts commit 25909b8 due to
unresolved questions about the behavior of "frame var" and ValueObject
in the presence of references (see the original patch for discussion).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants