Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Over four entries in levels causes abnormally high temperature reading #124

Closed
dbxnr opened this issue Feb 6, 2021 · 6 comments
Closed
Labels
bug Behavior not as intended feedback wanted

Comments

@dbxnr
Copy link

dbxnr commented Feb 6, 2021

Seem to have found a bug of sorts, at least on my setup (thinkfan 1.2.1, Lenovo X201), if there are more than four entries in thinkfan.conf it causes an abnormally high temperature reading triggering the highest fan level.

This is my .conf

sensors:
  - hwmon: /sys/devices/platform/coretemp.0/hwmon/hwmon4/temp2_input
  - hwmon: /sys/devices/platform/coretemp.0/hwmon/hwmon4/temp4_input
  - hwmon: /sys/devices/virtual/thermal/thermal_zone0/hwmon1/temp1_input
  
  - tpacpi: /proc/acpi/ibm/thermal
    indices: [1]

fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - [1, 0, 60]
  - [6, 60, 80]
  - [7, 80, 95]
  - ["level disengaged", 95, 3276]

Gives an output something like this

~ ❯ sudo thinkfan -n                                                      
Temperatures(bias): 32(0), 34(0), 40(0), 5(0) -> level 0
^CCleaning up and resetting fan control.

However if levels was something like

  - [0, 0, 40
  - [1, 4, 60]
  - [6, 60, 80]
  - [7, 80, 95]
  - ["level disengaged", 95, 3276]

thinkfan -n outputs this, with a weird temperature reading that causes the highest fan level to trigger..

~ ❯ sudo thinkfan -n                                                      
Temperatures(bias): 32(0), 34(0), 41(0), 21955(0) -> level disengaged
^CCleaning up and resetting fan control.
@vmatare
Copy link
Owner

vmatare commented Apr 15, 2021

Hi, this behavior looks very weird to me, and I find it almost impossible to believe it's really related to the number of levels you have. That's in particular because in the previous output you give, thinkfan appears to read 5 from /proc/acpi/ibm/thermal which to me doesn't look any more sane than the 21955 you're getting below. It's my impression that thinkfan is reading garbage from /proc/acpi/ibm/thermal in both cases.

Could you please try to reproduce the behavior once more and then post here the output of the following command:

cat /proc/acpi/ibm/thermal

@vmatare vmatare added can't repro Developer can't reproduce described behavior need info labels Apr 15, 2021
@calvinrw
Copy link
Contributor

I'm seeing a very similar behavior in my setup using thinkfan 1.2 (1.2.1-3, from Debian Unstable) on my ThinkPad W500, where specifying more than 3 indices of /proc/acpi/ibm/thermal results in obviously garbage temperature values being reported and used.

My desired thinkfan.yaml, with all the sensors I want, is as follows. I basically 'ported' these settings from the thinkfan.conf that I've been using since ~2011 on this computer.

sensors:
  - tpacpi: /proc/acpi/ibm/thermal
    indices: [0, 1, 2, 6, 8, 9]
fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - [0, 0,  22]
  - [1, 21, 33]
  - [2, 32, 38]
  - [4, 37, 44]
  - [6, 43, 55]
  - [7, 54, 60]
  - ["level disengaged",  59,  32767]

Results:

root@calvin-w500 # cat /proc/acpi/ibm/thermal 
temperatures:	42 40 35 -1 50 -128 34 -128 38 44 41 -128 -128 -128 -128 -128

root@calvin-w500 # thinkfan -v -n -c ./thinkfan.yaml
/proc/acpi/ibm/fan: Saved initial state: auto.
Temperatures(bias): 42(0), 40(0), 35(0), 67(0), -1707701994(0), 22139(0) -> level disengaged
^CCleaning up and resetting fan control.
/proc/acpi/ibm/fan: Restoring initial state: auto.

Those are some... interesting temperatures, alright.

However, if I change the indices line to [0, 1, 2], thinkfan's output reports what I'd expect to see:

root@calvin-w500 # cat /proc/acpi/ibm/thermal 
temperatures:	44 41 36 -1 50 -128 34 -128 39 45 42 -128 -128 -128 -128 -128

root@calvin-w500 # thinkfan -v -n -c ./thinkfan.yaml
/proc/acpi/ibm/fan: Saved initial state: auto.
Temperatures(bias): 44(0), 41(0), 36(0) -> level 6

Similar results with strange temperature values can be also seen when using larger subsets of the sensor list ([0, 1, 2, 4]).

I'm happy to do more testing or provide any more info to help get this sorted out.

@calvinrw
Copy link
Contributor

I've done a little more testing, and the issue I'm seeing doesn't seem to be related to the number of indices specified, but the indices themselves. For example,

sensors:
  - tpacpi: /proc/acpi/ibm/thermal
    indices: [2, 8]

will yield:

root@calvin-w500 # cat /proc/acpi/ibm/thermal 
temperatures:	43 40 37 -1 50 -128 33 -128 39 44 42 -128 -128 -128 -128 -128

root@calvin-w500 # ./thinkfan -v -n -c ./thinkfan.yaml 
/proc/acpi/ibm/fan: Saved initial state: auto.
Temperatures(bias): 37(0), 72(0) -> Fans: level disengaged
^CCleaning up and resetting fan control.
/proc/acpi/ibm/fan: Restoring initial state: auto.

72C is not an outrageous value, especially for a Core2Duo T9900 with an undersized heatsink, but no sensor is reporting that value.

In the meantime, I discovered that the updated sensor support in thinkfan 1.0+ is quite nice, and I was able to get more or less what I wanted using coretemp instead:

sensors:
  - hwmon: /sys/class/hwmon
    name: coretemp
    indices: [2, 3]
  - tpacpi: /proc/acpi/ibm/thermal
    indices: [0, 1, 2]

So, I was able to solve my issue another way, but there's almost definitely a bug in how the legacy sensor is being read/interpreted.

@vmatare
Copy link
Owner

vmatare commented Apr 27, 2021

Yup, looks like there might be some uninitialized memory or a buffer overflow happening here. However I'm currently quite busy, so I can't make promises when I'll find the time to investigate properly.

If anyone wants to help, try reproducing the issue on a DEBUG build running in valgrind --track-origins=yes. If valgrind finds anything, that might already give us a good hint on where to look for the problem.

@vmatare vmatare added bug Behavior not as intended and removed need info can't repro Developer can't reproduce described behavior labels Aug 30, 2021
@vmatare
Copy link
Owner

vmatare commented Aug 30, 2021

Should be fixed in the current master now, specifically by 6b9e348. Please test and let me know whether the issue is resolved for you.

@calvinrw
Copy link
Contributor

calvinrw commented Sep 4, 2021

The issue appears to be fixed indeed.

My thinkfan.yaml:

sensors:
  - tpacpi: /proc/acpi/ibm/thermal
    indices: [0, 1, 2, 4, 6]
fans:
  - tpacpi: /proc/acpi/ibm/fan

levels:
  - [0, 0,  22]
  - [1, 21, 33]
  - [2, 32, 38]
  - [4, 37, 44]
  - [6, 43, 55]
  - [7, 54, 60]
  - ["level disengaged",  59,  32767]

Output while compiling a small project:

calvin@calvin-w500 $ sudo cat /proc/acpi/ibm/thermal 
temperatures:	48 40 38 -1 50 -128 30 -128

calvin@calvin-w500 $ sudo thinkfan -n -v -c ./thinkfan.yaml 
Temperatures(bias): 49(0), 40(0), 38(0), 50(0), 30(0) -> Fans: level 6
Temperatures(bias): 52(4), 40(0), 38(0), 50(0), 30(0) -> Fans: level 7
Temperatures(bias): 57(7), 40(0), 38(0), 50(0), 30(0) -> Fans: level disengaged
Temperatures(bias): 56(0), 39(0), 38(0), 50(0), 30(0) -> Fans: level 7
Temperatures(bias): 50(0), 40(0), 38(0), 50(0), 30(0) -> Fans: level 6
^CCleaning up and resetting fan control.

It's working exactly how I expect/want it to. Thank you for providing the fix and for notifying us.

@vmatare vmatare closed this as completed Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Behavior not as intended feedback wanted
Projects
None yet
Development

No branches or pull requests

3 participants