Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add macho parser for use by C inject_hash #1435

Merged
merged 36 commits into from
Apr 15, 2024

Conversation

billbo-yang
Copy link
Contributor

Issues:

Addresses CryptoAlg-2240

Description of changes:

Adds a macho file parser and tests in anticipation of usage in an upcoming C language replacement for inject_hash.go

Call-outs:

Since this is only meant to be used in a specific use-case, it's reading capabilities are tailored to only read & store information on the parts of the macho file that we're interested in (__test section, __const section, the string table, and the symbol table. We can open another PR to add additional capabilities if the need arises.

Testing:

Tests pass locally

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@billbo-yang billbo-yang requested a review from a team as a code owner February 6, 2024 20:30
@torben-hansen torben-hansen self-requested a review February 6, 2024 20:46
@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.17%. Comparing base (1e4601e) to head (613e5ae).
Report is 25 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1435      +/-   ##
==========================================
+ Coverage   77.01%   77.17%   +0.15%     
==========================================
  Files         426      426              
  Lines       71738    71449     -289     
==========================================
- Hits        55252    55139     -113     
+ Misses      16486    16310     -176     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CMakeLists.txt Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.c Outdated Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.c Outdated Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.c Outdated Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.h Outdated Show resolved Hide resolved
Comment on lines 99 to 100
fseek(file, macho->sections[i].offset, SEEK_SET);
fread(section_data, 1, macho->sections[i].size, file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the return values on these calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

            if( 0 != fseek(file, macho->sections[i].offset, SEEK_SET)) {
              free(section_data);      // if section_data is local to loop
              LOG_ERROR("Failed to seek in file %s", filename);
              goto end;
            }


uint8_t* get_macho_section_data(const char *filename, machofile *macho, const char *section_name, size_t *size, uint32_t *offset) {
FILE *file = NULL;
uint8_t *section_data = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move the declaration of section_data down to the first place it's assigned to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm under the impression that I need to leave it out of the for loop in order to be able to free it, unless I'm misunderstanding you here?

Copy link
Contributor

@justsmth justsmth Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have it like this:

...
    for (uint32_t i = 0; i < macho->num_sections; i++) {
        if (strcmp(macho->sections[i].name, section_name) == 0) {
            uint8_t *section_data = malloc(macho->sections[i].size); // declaration
            if (section_data == NULL) {
                LOG_ERROR("Error allocating memory for section data");
                goto end;
            }

            fseek(file, macho->sections[i].offset, SEEK_SET);
            uint32_t bytes_read = fread(section_data, 1, macho->sections[i].size, file);
            if (bytes_read != macho->sections[i].size) {
                free(section_data);                                // free on error
                LOG_ERROR("Error reading section data from file %s", filename);
                goto end;
            }
...

util/fipstools/inject_hash/macho_parser/macho_parser.c Outdated Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.c Outdated Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.h Outdated Show resolved Hide resolved
util/fipstools/inject_hash/macho_parser/macho_parser.c Outdated Show resolved Hide resolved
if (strcmp(segment->segname, "__TEXT") == 0) {
section_data *sections = (section_data *)&segment[1];
for (uint32_t j = 0; j < segment->nsects; j++) {
if (strcmp(sections[j].sectname, "__text") == 0 || strcmp(sections[j].sectname, "__const") == 0) {
Copy link
Contributor

@justsmth justsmth Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of incrementing a section_index counter and using it as an index, why not just have an explicit index at which each section (e.g., "__text" at 0, "__const" at 1, etc.) is stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation is more easily expandable if we ever decide that we're interested in more than just the __text and __const sections, and I'm unsure if changing it around provides any added benefit to our current use-case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a known location for each section could make it more robust to potential(?) changes in the order that sections are processed. I don't see it limiting expansion to handle other sections; sections being added to the processing would likewise be assigned to a specific index.

int const_found = 0;
int symtab_found = 0;

for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {
Copy link
Contributor

@justsmth justsmth Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks suspcious. Is load_commands[i].cmdsize not always the same as sizeof(struct load_command)? This logic assumes that it can be an exact multiple (>1). A comment here should explain the "why".

Otherwise, if the two values are equal it could be simplified:

    const uint32_t num_cmds = macho->macho_header.sizeofcmds / sizeof(struct load_command);
    for (uint32_t i = 0; i < num_cmds; i += 1) {
...

Copy link
Contributor Author

@billbo-yang billbo-yang Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this works is that sizeof(struct load_command) results in 8 bytes (the struct consists of two uint32_ts). The documentation in loader.h says that .cmdsize is always a multiple of 8 on 64bit systems and .sizeofcmds is the sum of all present .cmdsize fields, so this should work here.

I'll add a comment explaining this, but I don't think your simplified code works since your code will increment i by sizeof(struct load_command) (which will always be 8) instead of the actual size of the command provided by load_commands[i].cmdsize.

torben-hansen
torben-hansen previously approved these changes Mar 27, 2024
.strsize = symtab_command_strsize,
};

if (fwrite(&test_header, sizeof(struct mach_header_64), 1, file) != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno what the definition are of these structures. But generally sizeof(struct) != <sum of size of fields in struct>.

This should work locally here, because you use sizeof(struct) to be both write and read. But in general, this is not the correct way to serialise.

int symtab_found = 0;

// mach-o/loader.h explains that cmdsize (and by extension sizeofcmds) must be a multiple of 8 on 64-bit systems. struct load_command will always be 8 bytes.
for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (uint32_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {
for (size_t i = 0; i < macho->macho_header.sizeofcmds / sizeof(struct load_command); i += load_commands[i].cmdsize / sizeof(struct load_command)) {

Idiomatic to use size_t. If one can speak of idiomatic C. Same for the other occurrences.

justsmth
justsmth previously approved these changes Apr 3, 2024
@billbo-yang billbo-yang enabled auto-merge (squash) April 15, 2024 19:32
@billbo-yang billbo-yang merged commit 2d45531 into aws:main Apr 15, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants