Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[StackAnalyzer] Analyze the static Stack for composite. #486

Open
wants to merge 130 commits into
base: main
Choose a base branch
from

Conversation

spadek67424
Copy link

@spadek67424 spadek67424 commented Aug 14, 2024

Summary of this Pull Request (PR)

Add description here.

Intent for your PR

Choose one (Mandatory):

  • This PR is for a code-review and is intended to get feedback, but not to be pulled yet.
  • This PR is mature, and ready to be integrated into the repo.

Reviewers (Mandatory):

(Specify @<github.com username(s)> of the reviewers. Ex: @user1, @user2)

Code Quality

As part of this pull request, I've considered the following:

Style:

  • Comments adhere to the Style Guide (SG)
  • Spacing adhere's to the SG
  • Naming adhere's to the SG
  • All other aspects of the SG are adhered to, or exceptions are justified in this pull request
  • I have run the auto formatter on my code before submitting this PR (see doc/auto_formatter.md for instructions)

Code Craftsmanship:

  • I've made an attempt to remove all redundant code
  • I've considered ways in which my changes might impact existing code, and cleaned it up
  • I've formatted the code in an effort to make it easier to read (proper error handling, function use, etc...)
  • I've commented appropriately where code is tricky
  • I agree that there is no "throw-away" code, and that code in this PR is of high quality

Testing

I've tested the code using the following test programs (provide list here):

  • micro_booter
  • unit_pingpong
  • unit_schedtests
  • ...(add others here)

@spadek67424 spadek67424 changed the title [StackAnalyzer] T [StackAnalyzer] Analyze the static Stack for composite. Aug 14, 2024
@@ -0,0 +1,36 @@
import unittest # The test framework
import sys
sys.path.insert(0, '/home/minghwu/work/StackAnalyzer/pyelftool_parser/src')
Copy link
Contributor

@mbai1010 mbai1010 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we may better use a relative path here

## test printc
self.assertEqual(self.stacklist[12], -416)
if __name__ == '__main__':
path = "./tests.unit_pingpong.global.ping"
Copy link
Contributor

@mbai1010 mbai1010 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should remove the part between the if statement and unittest.main(). They are duplicated with setUp(self) above

@@ -0,0 +1,203 @@
import sys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you likely want the entire pyelftool to be in tools/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why I try to merge from my own repo. Then it keeps moving it to main folder. I think it is my command line problems. We could move it afterward after we decide how to integrate with system.

pc_flag = 1
if (pc_flag == 1): ## catch the point until the next symbol, means it is exit point.
self.exit_pc = i.address
log(f'0x{i.address:x}:\t{i.mnemonic}\t{i.op_str}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run this, but does this log quite a bit to stdout/err? We'd like the output to be relatively quiet in most cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a debug.py. It is actually a log system. We could decide what level log to stdout.

self.entry_pc = symbol['st_value']
log("Set up entry point")
log(hex(self.entry_pc))
if(symbol.name == 'custom_acquire_stack'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that you don't treat the __cosrt_s_* symbols as entry points? Is the logic that they each call custom_acquire_stack, so it is OK to start tracing from there? That might miss some stack operations. I think that __cosrt_upcall_entry also uses that symbol, so you might do redundant computations.

I'm likely just missing something here ;-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. I just test the __cosrt_upcall_entry. I will give it a look.

if self.register.reg["pc"] in self.symbol.keys(): ## check function block (as basic block but we use function as unit.)
self.stackfunction.append(self.symbol[self.register.reg["pc"]])
logstack(self.symbol[self.register.reg["pc"]]) ## TODO: here is error.
self.register.updatestackreg(self.symbol[self.register.reg["pc"]] == 'custom_acquire_stack') ## if it is acquiring stack address, do not setting the stack size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't follow a lot of this logic. Not sure why a stackfunction is doing operations with the pc, or why the pc indexes a symbol correctly. I'm just having a hard time following all of this, which means I likely can't give a useful review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to fix the comment here. The comments is misleading. I did not notice that. Thank you for giving the ideas of it. But the idea is that I will calculate the stack size whenever we jump or flow to a new function block. I track how much the stack pointer changed, and I store the changing into an array.


#### set up next instruction pc

if (self.index == index_list.index(self.register.reg["pc"])): ## fetch next instruction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all quite a bit more complicated than I was expecting. Great job figuring it all out!

self.stackfunction.append(self.symbol[self.register.reg["pc"]])
logstack(self.symbol[self.register.reg["pc"]]) ## TODO: here is error.
self.register.updatestackreg(self.symbol[self.register.reg["pc"]] == 'custom_acquire_stack') ## if it is acquiring stack address, do not setting the stack size.
self.stacklist.append(self.register.reg["stack"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found an API in the library that told you if an instruction modifies stacks. Is this the API? I'm looking through here trying to find the stack logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is Executor(executor.py). I pass each instruction into it. And I use capstone to check if it is modifing the rsp. There is a "flagrsp" in executor to check it.

############################################
##------------------------------------------
## execute stage.
if flagrsp: ## if rsp is in the instruction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a part of the API that tells you if an instruction modifies the stack? Aren't those the only instructions you need to worry about?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use capstone's "(regs_read, regs_write) = inst.regs_access()". Check that is it reading/writing rsp? This regs_access can also specify the implication read/write rsp.

sys.add_objs_iter(&c_id, ElfObject::transition_iter(c_id, &sys, &mut build)?);
sys.add_invs_iter(&c_id, Invocations::transition_iter(c_id, &sys, &mut build)?);
println!("path:{}", obj.get_path());
let output = Command::new("python3")
.arg("/home/minghwu/work/minghwu/composite/pyelftool_parser/src/analyzer.py")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to use relative paths here so that you can avoid hardcoding the absolute path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed a problem. I do not figure out how to put into tools. Therefore, I put absolute path for this moment. I will fix it afterward. Thank you for pointing out.

sys.add_objs_iter(&c_id, ElfObject::transition_iter(c_id, &sys, &mut build)?);
sys.add_invs_iter(&c_id, Invocations::transition_iter(c_id, &sys, &mut build)?);
println!("path:{}", obj.get_path());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless a print is for a specific purpose (other than debugging), we don't want to add them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry. It is a debugging mistake.

} else {
let stderr = String::from_utf8_lossy(&output.stderr);
eprintln!("Script error: {}", stderr);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of this, we'll want to add a pass. This is likely in your other code, so I'm hesitate to say much here.

@gparmer
Copy link
Collaborator

gparmer commented Aug 27, 2024

I'd like to understand a couple of things:

  1. The complexity of the analysis of the stack usage -- I'm likely just missing where you're using the library API feature to tell you if a stack is updated in an instruction.
  2. Simple updates to structure (put it in tools/, remove absolute paths, figure out how to add a pass in the composer and use the output of this program.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants