Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

up_squared: samples/boards/up_squared/gpio_counter failed. #28551

Closed
chen-png opened this issue Sep 21, 2020 · 30 comments · Fixed by #35312
Closed

up_squared: samples/boards/up_squared/gpio_counter failed. #28551

chen-png opened this issue Sep 21, 2020 · 30 comments · Fixed by #35312
Assignees
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: X86 x86 and x86-64 priority: low Low impact/importance bug
Milestone

Comments

@chen-png
Copy link
Collaborator

Describe the bug
when running samples/boards/up_squared/gpio_counter on up_squared board, have configured the GPIO pins according to readme file, but it reported an error "ERROR: cannot set HAT pin 37 to OUT (-22)".

To Reproduce
Steps to reproduce the behavior:

  1. west build -b up_squared -p auto samples/boards/up_squared/gpio_counter
  2. west flash ~/up_squared.sh
  3. see error

Logs and console output
*** Booting Zephyr OS build v2.4.0-rc2 ***
ERROR: cannot set HAT pin 37 to OUT (-22)

Environment (please complete the following information):

  • OS: Fedora31
  • Toolchain: zephyr-sdk-0.11.4
  • Commit ID: 122bc38
@chen-png chen-png added bug The issue is a bug, or the PR is fixing a bug area: Tests Issues related to a particular existing or missing test labels Sep 21, 2020
@dcpleung
Copy link
Member

Have you set up the BIOS according to README and main.c? By default, all those pins are not set as GPIOs and cannot be used as such.

@MaureenHelm MaureenHelm added the priority: low Low impact/importance bug label Sep 22, 2020
@dcpleung dcpleung added the Waiting for response Waiting for author's response label Sep 22, 2020
@chen-png
Copy link
Collaborator Author

Have you set up the BIOS according to README and main.c?

yes, I have configured BIOS accroding to README in main.c file.

@dcpleung dcpleung removed the Waiting for response Waiting for author's response label Sep 23, 2020
@dcpleung dcpleung assigned chen-png and unassigned dcpleung Sep 23, 2020
@chen-png
Copy link
Collaborator Author

chen-png commented Nov 5, 2020

git bisect and found the first bad commit is 9f84637 .

@nashif nashif assigned jenmwms and unassigned chen-png Dec 11, 2020
@jenmwms
Copy link
Collaborator

jenmwms commented Jan 11, 2021

I can reproduce this issue. Thanks @chen-png for bisecting to 9f84637 , I'll look at it and sync with @andrewboie on suggestions.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 12, 2021

I'm looking into the device mapping here. A few notes about this GPIO/HAT Pin (37), it is 3rd of 4 GPIOs listed as OUT in https://github.com/zephyrproject-rtos/zephyr/blob/master/samples/boards/up_squared/gpio_counter/src/main.c, and it is the only of those 4 GPIOs that does not have a function-mode associated (see https://wiki.up-community.org/Pinout_UP2) toggled in the BIOS.

Also, looking at recent patch that might be related #31274.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 13, 2021

^that patch #31274 is EHL and UART driver, not UP2 and GPIO driver. Might not help here.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 14, 2021

@chen-png @dcpleung Can you please help me understand the value "(-22)" in the error message? I see it is ret from main.c line 126, but not sure which is associated with -22 (displayed differently than -E...)

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 14, 2021

Perhaps it is -EINVAL: Invalid argument GPIO, which is consistent with linux error codes...

@dcpleung
Copy link
Member

The error values are defined in lib/libc/minimal/include/errno.h: -22 is -EINVAL.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 14, 2021

It get's through the get_gpio_dev() ok (return 0, successful) in this check in main.c:

for (i = 0; i < NUM_PINS; i++) {
		if (get_gpio_dev(&counter_pins[i]) != 0) {
			return;
		}
	}

so something is happening with the arguments, for only HAT Pin 37, to gpio_pin_configure in

		ret = gpio_pin_configure(counter_pins[i].gpio_dev,
					 counter_pins[i].pin,
					 GPIO_OUTPUT_LOW);

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 15, 2021

Interesting, this seems to effect OUT and IN pins, depending on something TBD. If I remove HAT pin 37 from the structure _pin counter_pins[], the 3 remaining pins in the array are successfully set to OUT (no error). Then we instead observe error with setting the input pin:

Booting Zephyr OS build zephyr-v2.4.0-3009-gfc1b5de4c307 
ERROR: cannot set HAT pin 16 to IN (-22)

I'll try printk in the gpio driver (gpio_intel_apl.c) to see where it is failing. It is not clear if/how the patch at bisect would be a breaking point.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 15, 2021

building on that, also interesting if I avoid HAT pin 37 and 16, and instead use HAT pin 38 as IN not OUT, I can progress to the first counter value but does not advance:

Booting Zephyr OS build zephyr-v2.4.0-3009-gfc1b5de4c307 
counter: 0x0

@chen-png
Copy link
Collaborator Author

building on that, also interesting if I avoid HAT pin 37 and 16, and instead use HAT pin 38 as IN not OUT, I can progress to the first counter value but does not advance:

Booting Zephyr OS build zephyr-v2.4.0-3009-gfc1b5de4c307 
counter: 0x0

did you short the related pins after you changed the input pin number?

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 19, 2021

did you short the related pins after you changed the input pin number?

Yes, thanks for checking. Counter does not increment. Curious.
Edit: I also tried this modified configuration (using different GPIO as INPUT), back at the point when the sample works well by default, and the counter does not increment with the modification there either. That route may not help debug atm. Stepping back to study the driver more.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 20, 2021

Ok, the driver gets the -EINVAL error at the failed permission check for the pin (HAT pin 37). Wonder what might cause this to start failing. Looking into this. Here are my debug prints in the driver (sample is unchanged):

--- a/drivers/gpio/gpio_intel_apl.c
+++ b/drivers/gpio/gpio_intel_apl.c
@@ -232,10 +232,14 @@ static int gpio_intel_apl_config(const struct device *dev,
         raw_pin = cfg->pin_offset + pin;
 
+       printk("TEST GPIO DRIVER before check_perm\n pin %d offset %d raw_pin %d\n", pin, cfg->pin_offset, raw_pin);
+
        if (!check_perm(dev, raw_pin)) {
                return -EINVAL;
        }
 
+       printk("TEST GPIO DRIVER after check_perm ok\n");

which gave this output (each gpio device has an iteration of the driver, see the error in the 2nd gpio in the array)

Booting Zephyr OS build zephyr-v2.4.0-3265-g401ee4ae5bf0  ***
TEST GPIO DRIVER before check_perm
 pin 17 offset 0 raw_pin 17
TEST GPIO DRIVER after check_perm ok
TEST GPIO DRIVER before check_perm
 pin 14 offset 32 raw_pin 46
ERROR: cannot set HAT pin 37 to OUT (-22)

@andrewboie
Copy link
Contributor

I think a PR to add LOG_ERR() statements to all failed runtime checks in the driver might be useful for debugging issues like this in the future

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 20, 2021

PR to add LOG_ERR() statements

Great suggestion! I can work on that. I started with printk but doesn't help long term like logging statements.

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 20, 2021

A bit more on the printk study from earlier...here is the output from when the sample works. Interesting that the 2 pins (HAT Pin 37 and HAT Pin 16) which give us trouble as we move up the tree, both have an offset of 32:

Booting Zephyr OS build zephyr-v2.3.0-1094-g9e627ed6d2ed  ***
TEST GPIO DRIVER before check_perm
 pin 17 offset 0 raw_pin 17
TEST GPIO DRIVER after check_perm ok
TEST GPIO DRIVER before check_perm
 pin 14 offset 32 raw_pin 46
TEST GPIO DRIVER after check_perm ok
TEST GPIO DRIVER before check_perm
 pin 18 offset 0 raw_pin 18
TEST GPIO DRIVER after check_perm ok
TEST GPIO DRIVER before check_perm
 pin 19 offset 0 raw_pin 19
TEST GPIO DRIVER after check_perm ok
TEST GPIO DRIVER before check_perm
 pin 5 offset 32 raw_pin 37
TEST GPIO DRIVER after check_perm ok
counter: 0x0
counter: 0x1

@jenmwms
Copy link
Collaborator

jenmwms commented Jan 21, 2021

Ok, since the permission check failed for the pin (HAT pin 37), diving into check_perm() which does two checks for the pin under configuration: check if Host SW owns the pin, and check if function of pad is GPIO. Here, I can narrow the failure down to the latter case. This is snippet where the check fails:

	offset = data->pad_base + (raw_pin << 3);
	val = sys_read32(regs(dev) + offset);
	if (val & PAD_CFG0_PMODE_MASK) {
		/* mode is not zero => not functioning as GPIO */
		printk("Function of pad is NOT as GPIO\n");
		return false;
	}

@jenmwms
Copy link
Collaborator

jenmwms commented Feb 2, 2021

I suspect we are reading the incorrect address, but still investigating. The offset seems to have something to do with it based on testing above (both pins with 32-bit offset encounter issue.)

@jenmwms
Copy link
Collaborator

jenmwms commented Feb 18, 2021

Indeed this looks to be an address issue. I narrowed it down to this change from 9f84637
image

pad_base is not correct after this change. Investigating root cause, may be due to DEVICE_MMIO_NAMED_MAP( ) or regs(dev). Also possible that this change helped expose a bug we did not see before.

Couple of observations are the now-inconsistent value of data->pad_base and the reg base address used to calculate pad_base (see above snippet). The driver's init function is hit 10 times, despite only 5 GPIOs selected in the sample. Not an issue, but a consistent pattern.

Adding printk to show data->pad_base and the equivalent result of the static inline regs(dev) like this:

@@ -533,6 +544,8 @@ int gpio_intel_apl_init(const struct device *dev)
 
        DEVICE_MMIO_NAMED_MAP(dev, reg_base, K_MEM_CACHE_NONE);
        data->pad_base = sys_read32(regs(dev) + REG_PAD_BASE_ADDR);
+       mm_reg_t namedMMIO = DEVICE_MMIO_NAMED_GET(dev, reg_base);
+       printk("gpio init              pad_base 0x%x,                    namedMMIO 0x%x\n", data->pad_base, (uint32_t) (namedMMIO));

results in this for the code after 9f84637 (see lines starting with gpio init pad_base below):

gpio init              pad_base 0xffffffff,                    namedMMIO 0xafff7001
gpio init              pad_base 0x500,                    namedMMIO 0xafff6000
gpio init              pad_base 0xffffffff,                    namedMMIO 0xafff4001
gpio init              pad_base 0x500,                    namedMMIO 0xafff3000
gpio init              pad_base 0xffffffff,                    namedMMIO 0xafff1002
gpio init              pad_base 0xffffffff,                    namedMMIO 0xaffef001
gpio init              pad_base 0x500,                    namedMMIO 0xaffee000
gpio init              pad_base 0xffffffff,                    namedMMIO 0xaffec002
gpio init              pad_base 0xffffffff,                    namedMMIO 0xaffea001
gpio init              pad_base 0x500,                    namedMMIO 0xaffe9000
*** Booting Zephyr OS build zephyr-v2.5.0-188-gfc0cdba03ed1  ***
before check_perm                              pin 17,      offset 0,   raw_pin 17,   namedMMIO 0xafff3000
>> check_perm read pad owner                   offset 40,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 1,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
check_perm ok
next device
before check_perm                              pin 14,      offset 32,   raw_pin 46,   namedMMIO 0xaffef001
>> check_perm read pad owner                   offset 52,   val ffffffff,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW does NOT own the pin (bits)       offset 6,   val 3,   PAD_OWN_MASK 0x3
ERROR: cannot set HAT pin 37 to OUT (-22)

The pad_base 0xffffffff is suspicious, confirmed with @andyross. I'm not sure if the values should be same or unique when using the MMIO mapping based on regs(dev).

printk of data->pad_base when the sample worked was (see lines starting with gpio init pad_base below):

gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
gpio init              pad_base 0x500,                    cfgRegBase 0x910000
*** Booting Zephyr OS build zephyr-v2.3.0-1095-ga18d1d801c6f  ***
before check_perm                              pin 17,      offset 0,   raw_pin 17,   cfgRegBase 0x910000
>> check_perm read pad owner                   offset 40,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 1,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
check_perm ok
next device
before check_perm                              pin 14,      offset 32,   raw_pin 46,   cfgRegBase 0x910000
>> check_perm read pad owner                   offset 52,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 6,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
check_perm ok
next device
before check_perm                              pin 18,      offset 0,   raw_pin 18,   cfgRegBase 0x910000
>> check_perm read pad owner                   offset 40,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 2,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
check_perm ok
next device
before check_perm                              pin 19,      offset 0,   raw_pin 19,   cfgRegBase 0x910000
>> check_perm read pad owner                   offset 40,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 3,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
check_perm ok
next device
before check_perm                              pin 5,      offset 32,   raw_pin 37,   cfgRegBase 0x910000
>> check_perm read pad owner                   offset 48,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 5,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
check_perm ok
next device
>> check_perm read pad owner                   offset 48,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 5,   val 0,   PAD_OWN_MASK 0x3
>>>> Function of pad is GPIO OK
counter: 0x0

@jenmwms
Copy link
Collaborator

jenmwms commented Mar 3, 2021

I'm looking into why the data->pad_base is getting set as 0xffffffff, when its expected to be 0x500. It has something to do with the result of the static inline regs(dev), perhaps it is undefined behavior with using different types.

@jenmwms
Copy link
Collaborator

jenmwms commented Mar 3, 2021

  1. Today OOC I tried the sample on up_squared_32 and it too fails after 9f84637 (MMIO-related changes). Before that bisect, it works. This appears to be another symptom.

I started looking at it because that patch modified how we get the reg_base. The value in gpio_intel_apl_config was a uint32_t reg_base, but updated to use DEVICE_MMIO_NAMED_ROM(reg_base) instead

In API docs, we see

#define DEVICE_MMIO_NAMED_ROM ( name ) struct z_device_mmio_rom name
and that struct is defined in include/sys/device_mmio.h
image

So, maybe there's an issue with these data types as it propagates.

  1. Also looking at the definition of the helper macros. The use of _dev is different than some other implementations which are thoroughly tested, such as uart_ns16550.c

@jenmwms
Copy link
Collaborator

jenmwms commented Mar 17, 2021

This is still on my radar. Sorry for the delay.

@jenmwms
Copy link
Collaborator

jenmwms commented Mar 26, 2021

Discussed this with Spoorthy offline and we are looking into possible issues with data types, as mentioned earlier about the struct member changing from a uint32_t to a struct with different size. This may have introduced some overflow or undefined behavior.

@jenmwms
Copy link
Collaborator

jenmwms commented Apr 10, 2021

Hardcoding pad_base=0x500 to overcome the suspicious 0xffffffff value did not improve the issue. All gpio devices are init'd with it, but still same issue. Perhaps that is only one aspect.

Reviewing the prints in previous comments, I saw another difference in addition to pad_base:
val was 0 when it worked. After the bisected patch, val is non-zero.
Before/Working

>> check_perm read pad owner                   offset 52,   val 0,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW owns the pin OK (bits)            offset 6,    val 0,   PAD_OWN_MASK 0x3

After/Error

>> check_perm read pad owner                   offset 52,   val ffffffff,   REG_PAD_OWNER_BASE 0x20
>>>> Host SW does NOT own the pin (bits)       offset 6,    val 3,          PAD_OWN_MASK 0x3

@jenmwms
Copy link
Collaborator

jenmwms commented Apr 10, 2021

I'm digging around this part for regs(dev) (snippet below) because I found that even if I brute force through the check_perms by hardcoding pad_base to expected 0x500 and all the val to 0, this trips in the next stage of configuring interrupts.

reg = regs(dev) + data->pad_base + (raw_pin * 8U);

where

static inline mm_reg_t regs(struct device *dev)
{
	return DEVICE_MMIO_NAMED_GET(dev, reg_base);
}

@jenmwms
Copy link
Collaborator

jenmwms commented Apr 14, 2021

Noticed a couple things when I was looking at this alongside the tests for DEVICE_MMIO_NAMED_* macros (tests/kernel/device/src/mmio.c). Using those as example implementations, I'm considering:

  • We appear to get valid MMIO addresses. So why does the calculation of pad_base have incorrect result (0xffffffff) for the first few iterations of early driver initialization, and then flip flop between correct (0x500) and incorrect through the early initialization?
  • Is there an issue with gpio_intel_apl.c driver implementation of DEVICE_MMIO_NAMED_*? Try using alternative?
  • Does the gpio_intel_apl.c driver need to use the DEVICE_MMIO_NAMED_ROM_PTR() instead of _GET()?

@jenmwms
Copy link
Collaborator

jenmwms commented Apr 17, 2021

Worked backward on the mapping, leading to z_phys_map(). Building with CONFIG_ASSERT=y does throw any asserts for this sample (was making sure the z_phys_map() did not throw asserts). The *virt_ptr (which is reg_base in this case) appears to be valid address, and we observe it does function as expected for some of the GPIOs defined in this sample. Subtle bug that might be in z_phys_map(), so also looking into this more with Flavio.

@jenmwms
Copy link
Collaborator

jenmwms commented May 7, 2021

This issue still present after move to unified gpio driver for Intel SoC #34719

@jenmwms jenmwms assigned dcpleung and unassigned jenmwms May 12, 2021
@nashif nashif added this to the v2.6.0 milestone May 12, 2021
@LeiW000 LeiW000 added the platform: X86 x86 and x86-64 label May 13, 2021
dcpleung added a commit to dcpleung/zephyr that referenced this issue May 14, 2021
On Apollo Lake, each GPIO controller has more than 32 pins.
But Zephyr API can only manipulate 32 pins per controller.
So the workaround is to divide each hardware GPIO controller
into 32-pin blocks so each block has a GPIO driver instance.
Compounding to the issue is that there cannot be two device
tree nodes with same register address. So another workaround
is to increment the register addresses by 1 for each block.
So when mapping the address, the lowest 8-bit needs to be
masked to get the actual hardware address.

Fixes zephyrproject-rtos#28551

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
nashif pushed a commit that referenced this issue May 19, 2021
On Apollo Lake, each GPIO controller has more than 32 pins.
But Zephyr API can only manipulate 32 pins per controller.
So the workaround is to divide each hardware GPIO controller
into 32-pin blocks so each block has a GPIO driver instance.
Compounding to the issue is that there cannot be two device
tree nodes with same register address. So another workaround
is to increment the register addresses by 1 for each block.
So when mapping the address, the lowest 8-bit needs to be
masked to get the actual hardware address.

Fixes #28551

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: X86 x86 and x86-64 priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants