同步torvalds的内核修改。 #1

wutiejun · 2014-10-25T02:31:52Z

No description provided.

Move the platform detection function to separate functions to allow easier maintenence. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>

To simplify some of the platform detection code. Move the platform detection to a function to be called earlier. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>

Instead of using a module parameter, we should detect the errata via PCI DID and then set an appropriate flag. This will be used for additional errata later on. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>

On the Haswell platform, a split BAR option to allow creation of 2 32bit BARs (4 and 5) from the 64bit BAR 4. Adding support for this new option. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

If kprobes is disabled uprobes will not compile. Fix this by including the correct header files. Signed-off-by: Jan Willeke <willeke@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

So that when an evsel is embedded into other struct it can free up resources calling perf_evsel__exit(). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-n1w68pfe9m2vkhm4sqs8y1en@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

It was being found, by chance, because evsel.h needlessly includes util/cgroup.h, which will be sorted out in a following patch. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-xsvxr747wkkpg1ay9dramorr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The only thing we need is a forward declaration for 'struct cgroup_sel', that is inside 'struct perf_evsel'. Include cgroup.h instead on the tools that support cgroups. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-b7kuymbgf0zxi5viyjjtu5hk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

zatimend has reported that in his environment (3.16/gcc4.8.3/corei7) memset() calls which clear out sensitive data in extract_{buf,entropy, entropy_user}() in random driver are being optimized away by gcc. Add a helper memzero_explicit() (similarly as explicit_bzero() variants) that can be used in such cases where a variable with sensitive data is being cleared out in the end. Other use cases might also be in crypto code. [ I have put this into lib/string.c though, as it's always built-in and doesn't need any dependencies then. ] Fixes kernel bugzilla: 82041 Reported-by: zatimend@hotmail.co.uk Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org

Recently, in commit 13aa93c ("random: add and use memzero_explicit() for clearing data"), we have found that GCC may optimize some memset() cases away when it detects a stack variable is not being used anymore and going out of scope. This can happen, for example, in cases when we are clearing out sensitive information such as keying material or any e.g. intermediate results from crypto computations, etc. With the help of Coccinelle, we can figure out and fix such occurences in the crypto subsytem as well. Julia Lawall provided the following Coccinelle program: @@ type T; identifier x; @@ T x; ... when exists when any -memset +memzero_explicit (&x, -0, ...) ... when != x when strict @@ type T; identifier x; @@ T x[...]; ... when exists when any -memset +memzero_explicit (x, -0, ...) ... when != x when strict Therefore, make use of the drop-in replacement memzero_explicit() for exactly such cases instead of using memset(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Theodore Ts'o <tytso@mit.edu>

This simplifies the lanai.c driver by using the module_pci_driver() macro, at the expense of losing only debugging messages. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>

commit 971f10e ("tcp: better TCP_SKB_CB layout to reduce cache line misses") missed that cookie_v4_check() still calls ip_options_echo() which uses IPCB(). It should use TCPCB() at TCP layer, so call __ip_options_echo() instead. Fixes: commit 971f10e ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

cookie_v4_check() allocates ip_options_rcu in the same way with tcp_v4_save_options(), we can just make it a helper function. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

We can retrieve opt from skb, no need to pass it as a parameter. And opt should always be non-NULL, no need to check. Cc: Krzysztof Kolasa <kkolasa@winsoft.pl> Cc: Eric Dumazet <edumazet@google.com> Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Adding period data column to be displayed in perf script. It's possible to get period values using -f option, like: $ perf script -f comm,tid,time,period,ip,sym,dso :26019 26019 52414.329088: 3707 ffffffff8105443a native_write_msr_safe ([kernel.kallsyms]) :26019 26019 52414.329088: 44 ffffffff8105443a native_write_msr_safe ([kernel.kallsyms]) :26019 26019 52414.329093: 1987 ffffffff8105443a native_write_msr_safe ([kernel.kallsyms]) :26019 26019 52414.329093: 6 ffffffff8105443a native_write_msr_safe ([kernel.kallsyms]) ls 26019 52414.329442: 537558 3407c0639c _dl_map_object_from_fd (/usr/lib64/ld-2.17.so) ls 26019 52414.329442: 2099 3407c0639c _dl_map_object_from_fd (/usr/lib64/ld-2.17.so) ls 26019 52414.330181: 1242100 34080917bb get_next_seq (/usr/lib64/libc-2.17.so) ls 26019 52414.330181: 3774 34080917bb get_next_seq (/usr/lib64/libc-2.17.so) ls 26019 52414.331427: 1083662 ffffffff810c7dc2 update_curr ([kernel.kallsyms]) ls 26019 52414.331427: 360 ffffffff810c7dc2 update_curr ([kernel.kallsyms]) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: "Jen-Cheng(Tommy) Huang" <tommy24@gatech.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jen-Cheng(Tommy) Huang <tommy24@gatech.edu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1408977943-16594-9-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Adding period as a default output column in script command fo hardware, software and raw events. If PERF_SAMPLE_PERIOD sample type is defined in perf.data, following will be displayed in perf script output: $ perf script ls 8034 57477.887209: 250000 task-clock: ffffffff81361d72 memset ([kernel.kallsyms]) ls 8034 57477.887464: 250000 task-clock: ffffffff816f6d92 _raw_spin_unlock_irqrestore ([kernel.kallsyms]) ls 8034 57477.887708: 250000 task-clock: ffffffff811a94f0 do_munmap ([kernel.kallsyms]) ls 8034 57477.887959: 250000 task-clock: 34080916c6 get_next_seq (/usr/lib64/libc-2.17.so) ls 8034 57477.888208: 250000 task-clock: 3408079230 _IO_doallocbuf (/usr/lib64/libc-2.17.so) ls 8034 57477.888717: 250000 task-clock: ffffffff814242c8 n_tty_write ([kernel.kallsyms]) ls 8034 57477.889285: 250000 task-clock: 3408076402 fwrite_unlocked (/usr/lib64/libc-2.17.so) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: "Jen-Cheng(Tommy) Huang" <tommy24@gatech.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jen-Cheng(Tommy) Huang <tommy24@gatech.edu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1408977943-16594-10-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork and in case errors in __ip_append_data() nobody frees stolen dst entry Fixes: 2e77d89 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()") Signed-off-by: Vasily Averin <vvs@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp setting after arphdr_ok to avoid the use the freed memory Fixes: 0714812 ("openvswitch: Eliminate memset() from flow_extract.") Cc: Jesse Gross <jesse@nicira.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>

pskb_may_pull maybe change skb->data and make eth pointer oboslete, so eth needs to reload Fixes: 91269e3 ("vxlan: using pskb_may_pull as early as possible") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

If megaflows are disabled, the userspace does not send the netlink attribute OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask. sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the bytes that represent padding for struct sw_flow, or the bytes that represent fields that may not be set during ovs_flow_extract(). This is a problem, because when we extract a flow from a packet, we do not memset() anymore the struct sw_flow to 0. This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(), which operates on the netlink attributes rather than on the mask key. Using this approach we are sure that only the bytes that the user provided in the flow are matched. Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we now return with an error. This bug is introduced by commit 0714812 ("openvswitch: Eliminate memset() from flow_extract"). Reported-by: Alex Wang <alexw@nicira.com> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Steven French <smfrench@gmail.com>

In case that the IP header has optional field at the end, this patch will get the port numbers after that field, and compute the hash. The general parser skb_flow_dissect() is used here. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>

pskb_may_pull() maybe change skb->data and make eth pointer oboslete, so set eth after pskb_may_pull() Fixes:3d7b46cd("ip_tunnel: push generic protocol handling to ip_tunnel module") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>

pskb_may_pull() maybe change skb->data and make uh pointer oboslete, so reload uh and guehdr Fixes: 37dd024 ("gue: Receive side for Generic UDP Encapsulation") Cc: Tom Herbert <therbert@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Remove calling cancel_delayed_work_sync() for runtime suspend, because it would cause dead lock. Instead, return -EBUSY to avoid the device enters suspending if the net is running and the delayed work is pending or running. The delayed work would try to wake up the device later, so the suspending is not necessary. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Don't ring the doorbell, and don't do PIO. This will also prevent TX Push, because there will be more than one buffer waiting when the doorbell is rung. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Commit 971f10e ("tcp: better TCP_SKB_CB layout to reduce cache line misses") added a regression for SO_BINDTODEVICE on IPv6. This is because we still use inet6_iif() which expects that IP6 control block is still at the beginning of skb->cb[] This patch adds tcp_v6_iif() helper and uses it where necessary. Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif parameter to it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 971f10e ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>

In commit ec8a2e5 ("tipc: same receive code path for connection protocol and data messages") we omitted the the possiblilty that an arriving message extracted from a bundle buffer may be a multicast message. Such messages need to be to be delivered to the socket via a separate function, tipc_sk_mcast_rcv(). As a result, small multicast messages arriving as members of a bundle buffer will be silently dropped. This commit corrects the error by considering this case in the function tipc_link_bundle_rcv(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Commit b4d2394 ("dsa: Replace mii_bus with a generic host device") replaces mii_bus with a generic host_dev, and introduces dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus. However, in some cases it uses to_mii_bus to perform that conversion. Since host_dev is not the phy bus device but typically a platform device, this fails and results in a crash with the affected drivers. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81781d35>] __mutex_lock_slowpath+0x75/0x100 PGD 406783067 PUD 406784067 PMD 0 Oops: 0002 [#1] SMP ... Call Trace: [<ffffffff810a538b>] ? pick_next_task_fair+0x61b/0x880 [<ffffffff81781de3>] mutex_lock+0x23/0x37 [<ffffffff81533244>] mdiobus_read+0x34/0x60 [<ffffffff8153b95a>] __mv88e6xxx_reg_read+0x8a/0xa0 [<ffffffff8153b9bc>] mv88e6xxx_reg_read+0x4c/0xa0 Fixes: b4d2394 ("dsa: Replace mii_bus with a generic host device") Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

A platform driver for which nothing ever registers the corresponding platform device. Also it was driving the same hardware as sead3-i2c-drv.c so redundant anyway and couldn't co-exist with that driver because each of them was using a private spinlock to protect access to the same hardware resources. This also fixes a randconfig problem: arch/mips/mti-sead3/sead3-pic32-i2c-drv.c: In function 'i2c_platform_probe': arch/mips/mti-sead3/sead3-pic32-i2c-drv.c:345:2: error: implicit declaration of function 'i2c_add_numbered_adapter' [-Werror=implicit-function-declaration] ret = i2c_add_numbered_adapter(&priv->adap); ^ arch/mips/mti-sead3/sead3-pic32-i2c-drv.c: In function 'i2c_platform_remove': arch/mips/mti-sead3/sead3-pic32-i2c-drv.c:361:2: error: implicit declaration of function 'i2c_del_adapter' [-Werror=implicit-function-declaration] i2c_del_adapter(&priv->adap); Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

A failure to decode the instruction can cause a NULL pointer access. This is fixed simply by moving the "done" label as close as possible to the return. This fixes CVE-2014-8481. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Fixes: 41061cd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Currently, all group15 instructions are decoded as clflush (e.g., mfence, xsave). In addition, the clflush instruction requires no prefix (66/f2/f3) would exist. If prefix exists it may encode a different instruction (e.g., clflushopt). Creating a group for clflush, and different group for each prefix. This has been the case forever, but the next patch needs the cflush group in order to fix a bug introduced in 3.17. Fixes: 41061cd Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

The decode phase of the x86 emulator assumes that every instruction with the ModRM flag, and which can be used with RIP-relative addressing, has either SrcMem or DstMem. This is not the case for several instructions - prefetch, hint-nop and clflush. Adding SrcMem|NoAccess for prefetch and hint-nop and SrcMem for clflush. This fixes CVE-2014-8480. Fixes: 41061cd Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

The third parameter of kvm_unpin_pages() when called from kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin and not the page size. This error was facilitated with an inconsistent API: kvm_pin_pages() takes a size, but kvn_unpin_pages() takes a number of pages, so fix the problem by matching the two. This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of un-pinning for pages intended to be un-pinned (i.e. memory leak) but unfortunately potentially aggravated the number of pages we un-pin that should have stayed pinned. As far as I understand though, the same practical mitigations apply. This issue was found during review of Red Hat 6.6 patches to prepare Ksplice rebootless updates. Thanks to Vegard for his time on a late Friday evening to help me in understanding this code. Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)") Cc: stable@vger.kernel.org Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Jamie Iles <jamie.iles@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Even after the recent fix, the assertion on paging_tmpl.h is triggered. Apparently, the assertion wants to check that the PAE is always set on long-mode, but does it in incorrect way. Note that the assertion is not enabled unless the code is debugged by defining MMU_DEBUG. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

After commit 80ce163 (KVM: VFIO: register kvm_device_ops dynamically), kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29f (kvm-vfio: do not use module_init) move the dynamic register invoked by kvm_init in order to fix broke unloading of the kvm module. However, kvm_device_ops of vfio is unregistered after rmmod kvm-intel module which lead to device type collision detection warning after kvm-intel module reinsmod. WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]() Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel] CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9 0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48 ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff Call Trace: [<ffffffff814a61d9>] dump_stack+0x49/0x60 [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96 [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm] [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17 [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm] [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel] [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel] [<ffffffff810002ab>] do_one_initcall+0xe3/0x170 [<ffffffff811168a9>] ? __vunmap+0xad/0xb8 [<ffffffff8109c58f>] do_init_module+0x2b/0x174 [<ffffffff8109d414>] load_module+0x43e/0x569 [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174 [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82 [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20 [<ffffffff8109d65f>] SyS_init_module+0x54/0x81 [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b ---[ end trace 0626f4a3ddea56f3 ]--- The bug can be reproduced by: rmmod kvm_intel.ko insmod kvm_intel.ko without rmmod/insmod kvm.ko This patch fixes the bug by unregistering kvm_device_ops of vfio when the kvm-intel module is removed. Reported-by: Liu Rongrong <rongrongx.liu@intel.com> Fixes: 3c3c29f Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This isn't a module and shouldn't be one. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

When user asks to turn off ASLR by writing "0" to /proc/sys/kernel/randomize_va_space there should not be any randomization to mmap base, stack, VDSO, libs, text and heap Currently arm64 violates this behavior by randomising text. Fix this by defining a constant ELF_ET_DYN_BASE. The randomisation of mm->mmap_base is done by setup_new_exec -> arch_pick_mmap_layout -> mmap_base -> mmap_rnd. Signed-off-by: Arun Chandran <achandran@mvista.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

…g boot. Meelis Roos reported that kernels built with gcc-4.9 do not boot, we eventually narrowed this down to only impacting machines using UltraSPARC-III and derivitive cpus. The crash happens right when the first user process is spawned: [ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 [ 54.451346] [ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96 [ 54.666431] Call Trace: [ 54.698453] [0000000000762f8c] panic+0xb0/0x224 [ 54.759071] [000000000045cf68] do_exit+0x948/0x960 [ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100 [ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10 [ 54.978662] Press Stop-A (L1-A) to return to the boot prom [ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 Further investigation showed that compiling only per_cpu_patch() with an older compiler fixes the boot. Detailed analysis showed that the function is not being miscompiled by gcc-4.9, but it is using a different register allocation ordering. With the gcc-4.9 compiled function, something during the code patching causes some of the %i* input registers to get corrupted. Perhaps we have a TLB miss path into the firmware that is deep enough to cause a register window spill and subsequent restore when we get back from the TLB miss trap. Let's plug this up by doing two things: 1) Stop using the firmware stack for client interface calls into the firmware. Just use the kernel's stack. 2) As soon as we can, call into a new function "start_early_boot()" to put a one-register-window buffer between the firmware's deepest stack frame and the top-most initial kernel one. Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>

It is not sufficient to only implement get_user_pages_fast(), you must also implement the atomic version __get_user_pages_fast() otherwise you end up using the weak symbol fallback implementation which simply returns zero. This is dangerous, because it causes the futex code to loop forever if transparent hugepages are supported (see get_futex_key()). Signed-off-by: David S. Miller <davem@davemloft.net>

With 48-bit VA space, the 64K page configuration uses 3 levels instead of 2 and PUD_SIZE != PMD_SIZE. Since with 64K pages we only cover PMD_SIZE with the initial swapper_pg_dir populated in head.S, the memblock current_limit needs to be set accordingly in map_mem() to avoid allocating unmapped memory. The memblock current_limit is progressively increased as more blocks are mapped. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

…rzhang/linux Pull thermal management updates from Zhang Rui: "Sorry that I missed the merge window as there is a bug found in the last minute, and I have to fix it and wait for the code to be tested in linux-next tree for a few days. Now the buggy patch has been dropped entirely from my next branch. Thus I hope those changes can still be merged in 3.18-rc2 as most of them are platform thermal driver changes. Specifics: - introduce ACPI INT340X thermal drivers. Newer laptops and tablets may have thermal sensors and other devices with thermal control capabilities that are exposed for the OS to use via the ACPI INT340x device objects. Several drivers are introduced to expose the temperature information and cooling ability from these objects to user-space via the normal thermal framework. From: Lu Aaron, Lan Tianyu, Jacob Pan and Zhang Rui. - introduce a new thermal governor, which just uses a hysteresis to switch abruptly on/off a cooling device. This governor can be used to control certain fan devices that can not be throttled but just switched on or off. From: Peter Feuerer. - introduce support for some new thermal interrupt functions on i.MX6SX, in IMX thermal driver. From: Anson, Huang. - introduce tracing support on thermal framework. From: Punit Agrawal. - small fixes in OF thermal and thermal step_wise governor" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits) Thermal: int340x thermal: select ACPI fan driver Thermal: int3400_thermal: use acpi_thermal_rel parsing APIs Thermal: int340x_thermal: expose acpi thermal relationship tables Thermal: introduce int3403 thermal driver Thermal: introduce INT3402 thermal driver Thermal: move the KELVIN_TO_MILLICELSIUS macro to thermal.h ACPI / Fan: support INT3404 thermal device ACPI / Fan: add ACPI 4.0 style fan support ACPI / fan: convert to platform driver ACPI / fan: use acpi_device_xxx_power instead of acpi_bus equivelant ACPI / fan: remove no need check for device pointer ACPI / fan: remove unused macro Thermal: int3400 thermal: register to thermal framework Thermal: int3400 thermal: add capability to detect supporting UUIDs Thermal: introduce int3400 thermal driver ACPI: add ACPI_TYPE_LOCAL_REFERENCE support to acpi_extract_package() ACPI: make acpi_create_platform_device() an external API thermal: step_wise: fix: Prevent from binary overflow when trend is dropping ACPI: introduce ACPI int340x thermal scan handler thermal: Added Bang-bang thermal governor ...

…rnel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "This is material that didn't make it to my 3.18-rc1 pull request for various reasons, mostly related to timing and travel (LinuxCon EU / LPC) plus a couple of fixes for recent bugs. The only really new thing here is the PM QoS class for memory bandwidth, but it is simple enough and users of it will be added in the next cycle. One major change in behavior is that platform devices enumerated by ACPI will use 32-bit DMA mask by default. Also included is an ACPICA update to a new upstream release, but that's mostly cleanups, changes in tools and similar. The rest is fixes and cleanups mostly. Specifics: - Fix for a recent PCI power management change that overlooked the fact that some IRQ chips might not be able to configure PCIe PME for system wakeup from Lucas Stach. - Fix for a bug introduced in 3.17 where acpi_device_wakeup() is called with a wrong ordering of arguments from Zhang Rui. - A bunch of intel_pstate driver fixes (all -stable candidates) from Dirk Brandewie, Gabriele Mazzotta and Pali Rohár. - Fixes for a rather long-standing problem with the OOM killer and the freezer that frozen processes killed by the OOM do not actually release any memory until they are thawed, so OOM-killing them is rather pointless, with a couple of cleanups on top (Michal Hocko, Cong Wang, Rafael J Wysocki). - ACPICA update to upstream release 20140926, inlcuding mostly cleanups reducing differences between the upstream ACPICA and the kernel code, tools changes (acpidump, acpiexec) and support for the _DDN object (Bob Moore, Lv Zheng). - New PM QoS class for memory bandwidth from Tomeu Vizoso. - Default 32-bit DMA mask for platform devices enumerated by ACPI (this change is mostly needed for some drivers development in progress targeted at 3.19) from Heikki Krogerus. - ACPI EC driver cleanups, mostly related to debugging, from Lv Zheng. - cpufreq-dt driver updates from Thomas Petazzoni. - powernv cpuidle driver update from Preeti U Murthy" * tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits) intel_pstate: Correct BYT VID values. intel_pstate: Fix BYT frequency reporting intel_pstate: Don't lose sysfs settings during cpu offline cpufreq: intel_pstate: Reflect current no_turbo state correctly cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy PCI / PM: handle failure to enable wakeup on PCIe PME ACPI: invoke acpi_device_wakeup() with correct parameters PM / freezer: Clean up code after recent fixes PM: convert do_each_thread to for_each_process_thread OOM, PM: OOM killed task shouldn't escape PM suspend freezer: remove obsolete comments in __thaw_task() freezer: Do not freeze tasks killed by OOM killer ACPI / platform: provide default DMA mask cpuidle: powernv: Populate cpuidle state details by querying the device-tree cpufreq: cpufreq-dt: adjust message related to regulators cpufreq: cpufreq-dt: extend with platform_data cpufreq: allow driver-specific data ACPI / EC: Cleanup coding style. ACPI / EC: Refine event/query debugging messages. ...

…rnel/git/tytso/random Pull /dev/random updates from Ted Ts'o: "This adds a memzero_explicit() call which is guaranteed not to be optimized away by GCC. This is important when we are wiping cryptographically sensitive material" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: crypto: memzero_explicit - make sure to clear out sensitive data random: add and use memzero_explicit() for clearing data

…el/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Here are a chunk of small fixes since rc1: two PCM core fixes, one is a long-standing annoyance about lockdep and another is an ARM64 mmap fix. The rest are a HD-audio HDMI hotplug notification fix, a fix for missing NULL termination in Realtek codec quirks and a few new device/codec-specific quirks as usual" * tag 'sound-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Add missing terminating entry to SND_HDA_PIN_QUIRK macro ALSA: pcm: Fix false lockdep warnings ALSA: hda - Fix inverted LED gpio setup for Lenovo Ideapad ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug ALSA: usb-audio: Add support for Steinberg UR22 USB interface ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume ALSA: pcm: use the same dma mmap codepath both for arm and arm64

…ub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix regression in xen_clocksource_read() which caused all Xen guests to crash early in boot. - Several fixes for super rare race conditions in the p2m. - Assorted other minor fixes. * tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Allocate memory for physdev_pci_device_add's optarr x86/xen: panic on bad Xen-provided memory map x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read() x86/xen: avoid race in p2m handling x86/xen: delay construction of mfn_list_list x86/xen: avoid writing to freed memory after race in p2m handling xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered

Pull kvm fixes from Paolo Bonzini: "This is a pretty large update. I think it is roughly as big as what I usually had for the _whole_ rc period. There are a few bad bugs where the guest can OOPS or crash the host. We have also started looking at attack models for nested virtualization; bugs that usually result in the guest ring 0 crashing itself become more worrisome if you have nested virtualization, because the nested guest might bring down the non-nested guest as well. For current uses of nested virtualization these do not really have a security impact, but you never know and bugs are bugs nevertheless. A lot of these bugs are in 3.17 too, resulting in a large number of stable@ Ccs. I checked that all the patches apply there with no conflicts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vfio: fix unregister kvm_device_ops of vfio KVM: x86: Wrong assertion on paging_tmpl.h kvm: fix excessive pages un-pinning in kvm_iommu_map error path. KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag KVM: x86: Emulator does not decode clflush well KVM: emulate: avoid accessing NULL ctxt->memopp KVM: x86: Decoding guest instructions which cross page boundary may fail kvm: x86: don't kill guest on unknown exit reason kvm: vmx: handle invvpid vm exit gracefully KVM: x86: Handle errors when RIP is set during far jumps KVM: x86: Emulator fixes for eip canonical checks on near branches KVM: x86: Fix wrong masking on relative jump/call KVM: x86: Improve thread safety in pit KVM: x86: Prevent host from panicking on shared MSR writes. KVM: x86: Check non-canonical addresses upon WRMSR

Pull two sparc fixes from David Miller: 1) Fix boots with gcc-4.9 compiled sparc64 kernels. 2) Add missing __get_user_pages_fast() on sparc64 to fix hangs on futexes used in transparent hugepage areas. It's really idiotic to have a weak symbolled fallback that just returns zero, and causes this kind of bug. There should be no backup implementation and the link should fail if the architecture fails to provide __get_user_pages_fast() and supports transparent hugepages. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Implement __get_user_pages_fast(). sparc64: Fix register corruption in top-most kernel stack frame during boot.

…git/arm64/linux Pull arm64 fixes from Catalin Marinas: - enable 48-bit VA space now that KVM has been fixed, together with a couple of fixes for pgd allocation alignment and initial memblock current_limit. There is still a dependency on !ARM_SMMU which needs to be updated as it uses the page table manipulation macros of the host kernel - eBPF fixes following changes/conflicts during the merging window - Compat types affecting compat_elf_prpsinfo - Compilation error on UP builds - ASLR fix when /proc/sys/kernel/randomize_va_space == 0 - DT definitions for CLCD support on ARMv8 model platform * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix memblock current_limit with 64K pages and 48-bit VA arm64: ASLR: Don't randomise text when randomise_va_space == 0 arm64: vexpress: Add CLCD support to the ARMv8 model platform arm64: Fix compilation error on UP builds Documentation/arm64/memory.txt: fix typo net: bpf: arm64: minor fix of type in jited arm64: bpf: add 'load 64-bit immediate' instruction arm64: bpf: add 'shift by register' instructions net: bpf: arm64: address randomize and write protect JIT code arm64: mm: Correct fixmap pagetable types arm64: compat: fix compat types affecting struct compat_elf_prpsinfo arm64: Align less than PAGE_SIZE pgds naturally arm64: Allow 48-bits VA space without ARM_SMMU

…ream-linus Pull MIPS fixes from Ralf Baechle: "This is the first round of fixes and tying up loose ends for MIPS. - plenty of fixes for build errors in specific obscure configurations - remove redundant code on the Lantiq platform - removal of a useless SEAD I2C driver that was causing a build issue - fix an earlier TLB exeption handler fix to also work on Octeon. - fix ISA level dependencies in FPU emulator's instruction decoding. - don't hardcode kernel command line in Octeon software emulator. - fix an earlier fix for the Loondson 2 clock setting" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: SEAD3: Fix I2C device registration. MIPS: SEAD3: Nuke PIC32 I2C driver. MIPS: ftrace: Fix a microMIPS build problem MIPS: MSP71xx: Fix build error MIPS: Malta: Do not build the malta-amon.c file if CMP is not enabled MIPS: Prevent compiler warning from cop2_{save,restore} MIPS: Kconfig: Add missing MIPS_CPS dependencies to PM and cpuidle MIPS: idle: Remove leftover __pastwait symbol and its references MIPS: Sibyte: Include the swarm subdir to the sb1250 LittleSur builds MIPS: ptrace.h: Add a missing include MIPS: ath79: Fix compilation error when CONFIG_PCI is disabled MIPS: MSP71xx: Remove compilation error when CONFIG_MIPS_MT is present MIPS: Octeon: Remove special case for simulator command line. MIPS: tlbex: Properly fix HUGE TLB Refill exception handler MIPS: loongson2_cpufreq: Fix CPU clock rate setting mismerge pci: pci-lantiq: remove duplicate check on resource MIPS: Lasat: Add missing CONFIG_PROC_FS dependency to PICVUE_PROC MIPS: cp1emu: Fix ISA restrictions for cop1x_op instructions

同步torvalds的内核修改。

cat /sys/.../pools followed by removal the device leads to: |====================================================== |[ INFO: possible circular locking dependency detected ] |3.17.0-rc4+ #1498 Not tainted |------------------------------------------------------- |rmmod/2505 is trying to acquire lock: | (s_active#28){++++.+}, at: [<c017f754>] kernfs_remove_by_name_ns+0x3c/0x88 | |but task is already holding lock: | (pools_lock){+.+.+.}, at: [<c011494c>] dma_pool_destroy+0x18/0x17c | |which lock already depends on the new lock. |the existing dependency chain (in reverse order) is: | |-> #1 (pools_lock){+.+.+.}: | [<c0114ae8>] show_pools+0x30/0xf8 | [<c0313210>] dev_attr_show+0x1c/0x48 | [<c0180e84>] sysfs_kf_seq_show+0x88/0x10c | [<c017f960>] kernfs_seq_show+0x24/0x28 | [<c013efc4>] seq_read+0x1b8/0x480 | [<c011e820>] vfs_read+0x8c/0x148 | [<c011ea10>] SyS_read+0x40/0x8c | [<c000e960>] ret_fast_syscall+0x0/0x48 | |-> #0 (s_active#28){++++.+}: | [<c017e9ac>] __kernfs_remove+0x258/0x2ec | [<c017f754>] kernfs_remove_by_name_ns+0x3c/0x88 | [<c0114a7c>] dma_pool_destroy+0x148/0x17c | [<c03ad288>] hcd_buffer_destroy+0x20/0x34 | [<c03a4780>] usb_remove_hcd+0x110/0x1a4 The problem is the lock order of pools_lock and kernfs_mutex in dma_pool_destroy() vs show_pools() call path. This patch breaks out the creation of the sysfs file outside of the pools_lock mutex. The newly added pools_reg_lock ensures that there is no race of create vs destroy code path in terms whether or not the sysfs file has to be deleted (and was it deleted before we try to create a new one) and what to do if device_create_file() failed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

…_map_entry By the following commits, we prevented from allocating firmware_map_entry of same memory range: f0093ed: drivers/firmware/memmap.c: don't allocate firmware_map_entry of same memory range 49c8b24: drivers/firmware/memmap.c: pass the correct argument to firmware_map_find_entry_bootmem() But it's not enough. When PNP0C80 device is added by acpi_scan_init(), memmap sysfses of same firmware_map_entry are created twice as follows: # cat /sys/firmware/memmap/*/start 0x40000000000 0x60000000000 0x4a837000 0x4a83a000 0x4a8b5000 ... 0x40000000000 0x60000000000 ... The flows of the issues are as follows: 1. e820_reserve_resources() allocates firmware_map_entrys of all memory ranges defined in e820. And, these firmware_map_entrys are linked with map_entries list. map_entries -> entry 1 -> ... -> entry N 2. When PNP0C80 device is limited by mem= boot option, acpi_scan_init() added the memory device. In this case, firmware_map_add_hotplug() allocates firmware_map_entry and creates memmap sysfs. map_entries -> entry 1 -> ... -> entry N -> entry N+1 | memmap 1 3. firmware_memmap_init() creates memmap sysfses of firmware_map_entrys linked with map_entries. map_entries -> entry 1 -> ... -> entry N -> entry N+1 | | | memmap 2 memmap N+1 memmap 1 memmap N+2 So while hot removing the PNP0C80 device, kernel panic occurs as follows: BUG: unable to handle kernel paging request at 00000001003e000b IP: sysfs_open_file+0x46/0x2b0 PGD 203a89fe067 PUD 0 Oops: 0000 [#1] SMP ... Call Trace: do_dentry_open+0x1ef/0x2a0 finish_open+0x31/0x40 do_last+0x57c/0x1220 path_openat+0xc2/0x4c0 do_filp_open+0x4b/0xb0 do_sys_open+0xf3/0x1f0 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b The patch adds a check of confirming whether memmap sysfs of firmware_map_entry has been created, and does not create memmap sysfs of same firmware_map_entry. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

…ap() Error report likely result in IO so it is bad idea to do it from atomic context. This patch should fix following issue: BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349 in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1 5 locks held by kworker/u128:1/137: #0: ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0 #1: ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0 #2: (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0 #3: (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430 #4: (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630 CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 torvalds#165 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 Workqueue: writeback bdi_writeback_workfn (flush-1:0) 0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288 ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38 ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000 Call Trace: [<ffffffff815c7fdc>] dump_stack+0x51/0x6d [<ffffffff8108fb30>] __might_sleep+0xf0/0x100 [<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0 [<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20 [<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230 [<ffffffff8120fa03>] save_error_info+0x23/0x30 [<ffffffff8120fd06>] __ext4_error+0xb6/0xd0 [<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190 [<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630 [<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0 [<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60 [<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80 [<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280 [<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190 [<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410 [<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380 [<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0 [<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180 [<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0 [<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0 [<ffffffff81221797>] ? ext4_find_extent+0x97/0x320 [<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050 [<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430 [<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430 [<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50 [<ffffffff811feb44>] ext4_writepages+0x564/0xaf0 [<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490 [<ffffffff811377e3>] do_writepages+0x23/0x40 [<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280 [<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490 [<ffffffff811a0664>] wb_writeback+0x174/0x2d0 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811a0863>] wb_do_writeback+0xa3/0x200 [<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0 [<ffffffff810856cd>] process_one_work+0x2dd/0x4d0 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0 [<ffffffff81085c1d>] worker_thread+0x35d/0x460 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0 [<ffffffff8108a885>] kthread+0xf5/0x100 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70 [<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0 [<ffffffff8108a790>] ? __init_kthread_work Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org

…rder detected remove spinlock in cpdma_desc_pool_destroy() as there is no active cpdma channel and iounmap should be called without auquiring lock. root@dra7xx-evm:~# modprobe -r ti_cpsw [ 50.539743] [ 50.541312] ====================================================== [ 50.547796] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 50.554826] 3.14.19-02124-g95c5b7b torvalds#308 Not tainted [ 50.559939] ------------------------------------------------------ [ 50.566416] modprobe/1921 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 50.573347] (vmap_area_lock){+.+...}, at: [<c01127fc>] find_vmap_area+0x10/0x6c [ 50.581132] [ 50.581132] and this task is already holding: [ 50.587249] (&(&pool->lock)->rlock#2){..-...}, at: [<bf017c74>] cpdma_ctlr_destroy+0x5c/0x114 [davinci_cpdma] [ 50.597766] which would create a new lock dependency: [ 50.603048] (&(&pool->lock)->rlock#2){..-...} -> (vmap_area_lock){+.+...} [ 50.610296] [ 50.610296] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 50.618601] (&(&pool->lock)->rlock#2){..-...} ... which became SOFTIRQ-irq-safe at: [ 50.626829] [<c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c [ 50.632677] [<bf01773c>] cpdma_desc_free.constprop.7+0x28/0x58 [davinci_cpdma] [ 50.640437] [<bf0177e8>] __cpdma_chan_free+0x7c/0xa8 [davinci_cpdma] [ 50.647289] [<bf017908>] __cpdma_chan_process+0xf4/0x134 [davinci_cpdma] [ 50.654512] [<bf017984>] cpdma_chan_process+0x3c/0x54 [davinci_cpdma] [ 50.661455] [<bf0277e8>] cpsw_poll+0x14/0xa8 [ti_cpsw] [ 50.667038] [<c05844f4>] net_rx_action+0xc0/0x1e8 [ 50.672150] [<c0048234>] __do_softirq+0xcc/0x304 [ 50.677183] [<c004873c>] irq_exit+0xa8/0xfc [ 50.681751] [<c000eeac>] handle_IRQ+0x50/0xb0 [ 50.686513] [<c0008638>] gic_handle_irq+0x28/0x5c [ 50.691628] [<c06590a4>] __irq_svc+0x44/0x5c [ 50.696289] [<c0658ab4>] _raw_spin_unlock_irqrestore+0x34/0x44 [ 50.702591] [<c065a9c4>] do_page_fault.part.9+0x144/0x3c4 [ 50.708433] [<c065acb8>] do_page_fault+0x74/0x84 [ 50.713453] [<c00083dc>] do_DataAbort+0x34/0x98 [ 50.718391] [<c065923c>] __dabt_usr+0x3c/0x40 [ 50.723148] [ 50.723148] to a SOFTIRQ-irq-unsafe lock: [ 50.728893] (vmap_area_lock){+.+...} ... which became SOFTIRQ-irq-unsafe at: [ 50.736476] ... [<c06584e8>] _raw_spin_lock+0x28/0x38 [ 50.741876] [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300 [ 50.747908] [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134 [ 50.754210] [<c011486c>] get_vm_area_caller+0x3c/0x48 [ 50.759692] [<c0114be0>] vmap+0x40/0x78 [ 50.763900] [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0 [ 50.769835] [<c093eac0>] start_kernel+0x320/0x388 [ 50.774952] [<80008074>] 0x80008074 [ 50.778793] [ 50.778793] other info that might help us debug this: [ 50.778793] [ 50.787181] Possible interrupt unsafe locking scenario: [ 50.787181] [ 50.794295] CPU0 CPU1 [ 50.799042] ---- ---- [ 50.803785] lock(vmap_area_lock); [ 50.807446] local_irq_disable(); [ 50.813652] lock(&(&pool->lock)->rlock#2); [ 50.820782] lock(vmap_area_lock); [ 50.827086] <Interrupt> [ 50.829823] lock(&(&pool->lock)->rlock#2); [ 50.834490] [ 50.834490] *** DEADLOCK *** [ 50.834490] [ 50.840695] 4 locks held by modprobe/1921: [ 50.844981] #0: (&__lockdep_no_validate__){......}, at: [<c03e53e8>] driver_detach+0x44/0xb8 [ 50.854038] #1: (&__lockdep_no_validate__){......}, at: [<c03e53f4>] driver_detach+0x50/0xb8 [ 50.863102] #2: (&(&ctlr->lock)->rlock){......}, at: [<bf017c34>] cpdma_ctlr_destroy+0x1c/0x114 [davinci_cpdma] [ 50.873890] #3: (&(&pool->lock)->rlock#2){..-...}, at: [<bf017c74>] cpdma_ctlr_destroy+0x5c/0x114 [davinci_cpdma] [ 50.884871] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 50.892827] -> (&(&pool->lock)->rlock#2){..-...} ops: 167 { [ 50.898703] IN-SOFTIRQ-W at: [ 50.901995] [<c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c [ 50.909476] [<bf01773c>] cpdma_desc_free.constprop.7+0x28/0x58 [davinci_cpdma] [ 50.918878] [<bf0177e8>] __cpdma_chan_free+0x7c/0xa8 [davinci_cpdma] [ 50.927366] [<bf017908>] __cpdma_chan_process+0xf4/0x134 [davinci_cpdma] [ 50.936218] [<bf017984>] cpdma_chan_process+0x3c/0x54 [davinci_cpdma] [ 50.944794] [<bf0277e8>] cpsw_poll+0x14/0xa8 [ti_cpsw] [ 50.952009] [<c05844f4>] net_rx_action+0xc0/0x1e8 [ 50.958765] [<c0048234>] __do_softirq+0xcc/0x304 [ 50.965432] [<c004873c>] irq_exit+0xa8/0xfc [ 50.971635] [<c000eeac>] handle_IRQ+0x50/0xb0 [ 50.978035] [<c0008638>] gic_handle_irq+0x28/0x5c [ 50.984788] [<c06590a4>] __irq_svc+0x44/0x5c [ 50.991085] [<c0658ab4>] _raw_spin_unlock_irqrestore+0x34/0x44 [ 50.999023] [<c065a9c4>] do_page_fault.part.9+0x144/0x3c4 [ 51.006510] [<c065acb8>] do_page_fault+0x74/0x84 [ 51.013171] [<c00083dc>] do_DataAbort+0x34/0x98 [ 51.019738] [<c065923c>] __dabt_usr+0x3c/0x40 [ 51.026129] INITIAL USE at: [ 51.029335] [<c06585a4>] _raw_spin_lock_irqsave+0x38/0x4c [ 51.036729] [<bf017d78>] cpdma_chan_submit+0x4c/0x2f0 [davinci_cpdma] [ 51.045225] [<bf02863c>] cpsw_ndo_open+0x378/0x6bc [ti_cpsw] [ 51.052897] [<c058747c>] __dev_open+0x9c/0x104 [ 51.059287] [<c05876ec>] __dev_change_flags+0x88/0x160 [ 51.066420] [<c05877e4>] dev_change_flags+0x18/0x48 [ 51.073270] [<c05ed51c>] devinet_ioctl+0x61c/0x6e0 [ 51.080029] [<c056ee54>] sock_ioctl+0x5c/0x298 [ 51.086418] [<c01350a4>] do_vfs_ioctl+0x78/0x61c [ 51.092993] [<c01356ac>] SyS_ioctl+0x64/0x74 [ 51.099200] [<c000e580>] ret_fast_syscall+0x0/0x48 [ 51.105956] } [ 51.107696] ... key at: [<bf019000>] __key.21312+0x0/0xfffff650 [davinci_cpdma] [ 51.115912] ... acquired at: [ 51.119019] [<c00899ac>] lock_acquire+0x9c/0x104 [ 51.124138] [<c06584e8>] _raw_spin_lock+0x28/0x38 [ 51.129341] [<c01127fc>] find_vmap_area+0x10/0x6c [ 51.134547] [<c0114960>] remove_vm_area+0x8/0x6c [ 51.139659] [<c0114a7c>] __vunmap+0x20/0xf8 [ 51.144318] [<c001c350>] __arm_iounmap+0x10/0x18 [ 51.149440] [<bf017d08>] cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma] [ 51.156560] [<bf026294>] cpsw_remove+0x48/0x8c [ti_cpsw] [ 51.162407] [<c03e62c8>] platform_drv_remove+0x18/0x1c [ 51.168063] [<c03e4c44>] __device_release_driver+0x70/0xc8 [ 51.174094] [<c03e5458>] driver_detach+0xb4/0xb8 [ 51.179212] [<c03e4a6c>] bus_remove_driver+0x4c/0x90 [ 51.184693] [<c00b024c>] SyS_delete_module+0x10c/0x198 [ 51.190355] [<c000e580>] ret_fast_syscall+0x0/0x48 [ 51.195661] [ 51.197217] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 51.205986] -> (vmap_area_lock){+.+...} ops: 520 { [ 51.211032] HARDIRQ-ON-W at: [ 51.214321] [<c06584e8>] _raw_spin_lock+0x28/0x38 [ 51.221090] [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300 [ 51.228750] [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134 [ 51.236690] [<c011486c>] get_vm_area_caller+0x3c/0x48 [ 51.243811] [<c0114be0>] vmap+0x40/0x78 [ 51.249654] [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0 [ 51.257239] [<c093eac0>] start_kernel+0x320/0x388 [ 51.263994] [<80008074>] 0x80008074 [ 51.269474] SOFTIRQ-ON-W at: [ 51.272769] [<c06584e8>] _raw_spin_lock+0x28/0x38 [ 51.279525] [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300 [ 51.287190] [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134 [ 51.295126] [<c011486c>] get_vm_area_caller+0x3c/0x48 [ 51.302245] [<c0114be0>] vmap+0x40/0x78 [ 51.308094] [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0 [ 51.315669] [<c093eac0>] start_kernel+0x320/0x388 [ 51.322423] [<80008074>] 0x80008074 [ 51.327906] INITIAL USE at: [ 51.331112] [<c06584e8>] _raw_spin_lock+0x28/0x38 [ 51.337775] [<c011376c>] alloc_vmap_area.isra.28+0xb8/0x300 [ 51.345352] [<c0113a44>] __get_vm_area_node.isra.29+0x90/0x134 [ 51.353197] [<c011486c>] get_vm_area_caller+0x3c/0x48 [ 51.360224] [<c0114be0>] vmap+0x40/0x78 [ 51.365977] [<c09442f0>] check_writebuffer_bugs+0x54/0x1a0 [ 51.373464] [<c093eac0>] start_kernel+0x320/0x388 [ 51.380131] [<80008074>] 0x80008074 [ 51.385517] } [ 51.387260] ... key at: [<c0a66948>] vmap_area_lock+0x10/0x20 [ 51.393841] ... acquired at: [ 51.396945] [<c00899ac>] lock_acquire+0x9c/0x104 [ 51.402060] [<c06584e8>] _raw_spin_lock+0x28/0x38 [ 51.407266] [<c01127fc>] find_vmap_area+0x10/0x6c [ 51.412478] [<c0114960>] remove_vm_area+0x8/0x6c [ 51.417592] [<c0114a7c>] __vunmap+0x20/0xf8 [ 51.422252] [<c001c350>] __arm_iounmap+0x10/0x18 [ 51.427369] [<bf017d08>] cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma] [ 51.434487] [<bf026294>] cpsw_remove+0x48/0x8c [ti_cpsw] [ 51.440336] [<c03e62c8>] platform_drv_remove+0x18/0x1c [ 51.446000] [<c03e4c44>] __device_release_driver+0x70/0xc8 [ 51.452031] [<c03e5458>] driver_detach+0xb4/0xb8 [ 51.457147] [<c03e4a6c>] bus_remove_driver+0x4c/0x90 [ 51.462628] [<c00b024c>] SyS_delete_module+0x10c/0x198 [ 51.468289] [<c000e580>] ret_fast_syscall+0x0/0x48 [ 51.473584] [ 51.475140] [ 51.475140] stack backtrace: [ 51.479703] CPU: 0 PID: 1921 Comm: modprobe Not tainted 3.14.19-02124-g95c5b7b torvalds#308 [ 51.487744] [<c0016090>] (unwind_backtrace) from [<c0012060>] (show_stack+0x10/0x14) [ 51.495865] [<c0012060>] (show_stack) from [<c0652a20>] (dump_stack+0x78/0x94) [ 51.503444] [<c0652a20>] (dump_stack) from [<c0086f18>] (check_usage+0x408/0x594) [ 51.511293] [<c0086f18>] (check_usage) from [<c00870f8>] (check_irq_usage+0x54/0xb0) [ 51.519416] [<c00870f8>] (check_irq_usage) from [<c0088724>] (__lock_acquire+0xe54/0x1b90) [ 51.528077] [<c0088724>] (__lock_acquire) from [<c00899ac>] (lock_acquire+0x9c/0x104) [ 51.536291] [<c00899ac>] (lock_acquire) from [<c06584e8>] (_raw_spin_lock+0x28/0x38) [ 51.544417] [<c06584e8>] (_raw_spin_lock) from [<c01127fc>] (find_vmap_area+0x10/0x6c) [ 51.552726] [<c01127fc>] (find_vmap_area) from [<c0114960>] (remove_vm_area+0x8/0x6c) [ 51.560935] [<c0114960>] (remove_vm_area) from [<c0114a7c>] (__vunmap+0x20/0xf8) [ 51.568693] [<c0114a7c>] (__vunmap) from [<c001c350>] (__arm_iounmap+0x10/0x18) [ 51.576362] [<c001c350>] (__arm_iounmap) from [<bf017d08>] (cpdma_ctlr_destroy+0xf0/0x114 [davinci_cpdma]) [ 51.586494] [<bf017d08>] (cpdma_ctlr_destroy [davinci_cpdma]) from [<bf026294>] (cpsw_remove+0x48/0x8c [ti_cpsw]) [ 51.597261] [<bf026294>] (cpsw_remove [ti_cpsw]) from [<c03e62c8>] (platform_drv_remove+0x18/0x1c) [ 51.606659] [<c03e62c8>] (platform_drv_remove) from [<c03e4c44>] (__device_release_driver+0x70/0xc8) [ 51.616237] [<c03e4c44>] (__device_release_driver) from [<c03e5458>] (driver_detach+0xb4/0xb8) [ 51.625264] [<c03e5458>] (driver_detach) from [<c03e4a6c>] (bus_remove_driver+0x4c/0x90) [ 51.633749] [<c03e4a6c>] (bus_remove_driver) from [<c00b024c>] (SyS_delete_module+0x10c/0x198) [ 51.642781] [<c00b024c>] (SyS_delete_module) from [<c000e580>] (ret_fast_syscall+0x0/0x48) Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>

A panic was seen in the following sitation. There are two threads running on the system. The first thread is a system monitoring thread that is reading /proc/modules. The second thread is loading and unloading a module (in this example I'm using my simple dummy-module.ko). Note, in the "real world" this occurred with the qlogic driver module. When doing this, the following panic occurred: ------------[ cut here ]------------ kernel BUG at kernel/module.c:3739! invalid opcode: 0000 [#1] SMP Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module] CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ torvalds#7 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000 RIP: 0010:[<ffffffff810d64c5>] [<ffffffff810d64c5>] module_flags+0xb5/0xc0 RSP: 0018:ffff88080fa7fe18 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000 RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000 R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000 FS: 00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00 ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0 Call Trace: [<ffffffff810d666c>] m_show+0x19c/0x1e0 [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0 [<ffffffff812281ed>] proc_reg_read+0x3d/0x80 [<ffffffff811c0f2c>] vfs_read+0x9c/0x170 [<ffffffff811c1a58>] SyS_read+0x58/0xb0 [<ffffffff81605829>] system_call_fastpath+0x16/0x1b Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 RIP [<ffffffff810d64c5>] module_flags+0xb5/0xc0 RSP <ffff88080fa7fe18> Consider the two processes running on the system. CPU 0 (/proc/modules reader) CPU 1 (loading/unloading module) CPU 0 opens /proc/modules, and starts displaying data for each module by traversing the modules list via fs/seq_file.c:seq_open() and fs/seq_file.c:seq_read(). For each module in the modules list, seq_read does op->start() <-- this is a pointer to m_start() op->show() <- this is a pointer to m_show() op->stop() <-- this is a pointer to m_stop() The m_start(), m_show(), and m_stop() module functions are defined in kernel/module.c. The m_start() and m_stop() functions acquire and release the module_mutex respectively. ie) When reading /proc/modules, the module_mutex is acquired and released for each module. m_show() is called with the module_mutex held. It accesses the module struct data and attempts to write out module data. It is in this code path that the above BUG_ON() warning is encountered, specifically m_show() calls static char *module_flags(struct module *mod, char *buf) { int bx = 0; BUG_ON(mod->state == MODULE_STATE_UNFORMED); ... The other thread, CPU 1, in unloading the module calls the syscall delete_module() defined in kernel/module.c. The module_mutex is acquired for a short time, and then released. free_module() is called without the module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED, also without the module_mutex. Some additional code is called and then the module_mutex is reacquired to remove the module from the modules list: /* Now we can delete it from the lists */ mutex_lock(&module_mutex); stop_machine(__unlink_module, mod, NULL); mutex_unlock(&module_mutex); This is the sequence of events that leads to the panic. CPU 1 is removing dummy_module via delete_module(). It acquires the module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to MODULE_STATE_UNFORMED yet. CPU 0, which is reading the /proc/modules, acquires the module_mutex and acquires a pointer to the dummy_module which is still in the modules list. CPU 0 calls m_show for dummy_module. The check in m_show() for MODULE_STATE_UNFORMED passed for dummy_module even though it is being torn down. Meanwhile CPU 1, which has been continuing to remove dummy_module without holding the module_mutex, now calls free_module() and sets dummy_module->state to MODULE_STATE_UNFORMED. CPU 0 now calls module_flags() with dummy_module and ... static char *module_flags(struct module *mod, char *buf) { int bx = 0; BUG_ON(mod->state == MODULE_STATE_UNFORMED); and BOOM. Acquire and release the module_mutex lock around the setting of MODULE_STATE_UNFORMED in the teardown path, which should resolve the problem. Testing: In the unpatched kernel I can panic the system within 1 minute by doing while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done and while (true) do cat /proc/modules; done in separate terminals. In the patched kernel I was able to run just over one hour without seeing any issues. I also verified the output of panic via sysrq-c and the output of /proc/modules looks correct for all three states for the dummy_module. dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-) dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE) dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+) Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org

If dma_async_device_register() returns error and probe should clean up and return error, a NULL pointer exception happens because of dereference of not allocated channel thread: Dmesg log (from early printk): dma-pl330 12680000.pdma: unable to register DMAC DMA pl330_control: removing pch: eeac4000, chan: eeac4014, thread: (null) Unable to handle kernel NULL pointer dereference at virtual address 0000000c pgd = c0004000 [0000000c] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc3-next-20140904-00005-g6cc4c1937d90-dirty torvalds#427 task: ee80a800 ti: ee888000 task.ti: ee888000 PC is at _stop+0x8/0x2c8 LR is at pl330_control+0x70/0x2e8 pc : [<c0205dc8>] lr : [<c020623c>] psr: 60000193 sp : ee889df8 ip : 00000002 fp : 00000000 r10: eeac4014 r9 : ee0e62bc r8 : 00000000 r7 : eeac405c r6 : 60000113 r5 : ee0e6210 r4 : eeac4000 r3 : 00000002 r2 : 00000002 r1 : 00010000 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 4000404a DAC: 00000015 Process swapper/0 (pid: 1, stack limit = 0xee888240) Stack: (0xee889df8 to 0xee88a000) 9de0: 00000002 eeac4000 9e00: ee0e6210 eeac4000 ee0e6210 60000113 eeac405c c020623c 00000000 c020725c 9e20: ee889e20 ee889e20 ee0e6210 eeac4080 00200200 00100100 eeac4014 00000020 9e40: ee0e6218 c0208374 00000000 ee9bb340 ee0e6210 00000000 00000000 c0605cd8 9e60: ee970000 c0605c84 ee9700f8 00000000 c05c4270 00000000 00000000 c0203b3c 9e80: ee970000 c06624a8 00000000 c0605c84 00000000 c023f890 ee970000 c0605c84 9ea0: ee970034 00000000 c05b23d0 c023fa3c 00000000 c0605c84 c023f9b0 c023e0d4 9ec0: ee947e78 ee9b9440 c0605c84 eea1e780 c0605acc c023f094 c0513b50 c0605c84 9ee0: c05ecbd8 c0605c84 c05ecbd8 ee11ba40 c0626500 c0240064 00000000 c05ecbd8 9f00: c05ecbd8 c0008964 c040f13c 0000009f c0626500 c057465c ee80a800 60000113 9f20: 00000000 c05efdb0 60000113 00000000 ef7fc89d c0421168 0000008f c003787c 9f40: c0573d6c 00000006 ef7fc8bb 00000006 c05efd50 ef7fc800 c05dfbc4 00000006 9f60: c05c4264 c0626500 0000008f c05c4270 c059b518 c059bcb4 00000006 00000006 9f80: c059b518 c003c08c 00000000 c040091c 00000000 00000000 00000000 00000000 9fa0: 00000000 c0400924 00000000 c000e7b8 00000000 00000000 00000000 00000000 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 c0c0c0c0 c0c0c0c0 [<c0205dc8>] (_stop) from [<c020623c>] (pl330_control+0x70/0x2e8) [<c020623c>] (pl330_control) from [<c0208374>] (pl330_probe+0x594/0x75c) [<c0208374>] (pl330_probe) from [<c0203b3c>] (amba_probe+0xb8/0x120) [<c0203b3c>] (amba_probe) from [<c023f890>] (driver_probe_device+0x10c/0x22c) [<c023f890>] (driver_probe_device) from [<c023fa3c>] (__driver_attach+0x8c/0x90) [<c023fa3c>] (__driver_attach) from [<c023e0d4>] (bus_for_each_dev+0x54/0x88) [<c023e0d4>] (bus_for_each_dev) from [<c023f094>] (bus_add_driver+0xd4/0x1d0) [<c023f094>] (bus_add_driver) from [<c0240064>] (driver_register+0x78/0xf4) [<c0240064>] (driver_register) from [<c0008964>] (do_one_initcall+0x80/0x1d0) [<c0008964>] (do_one_initcall) from [<c059bcb4>] (kernel_init_freeable+0x108/0x1d4) [<c059bcb4>] (kernel_init_freeable) from [<c0400924>] (kernel_init+0x8/0xec) [<c0400924>] (kernel_init) from [<c000e7b8>] (ret_from_fork+0x14/0x3c) Code: e5813010 e12fff1e e92d40f0 e24dd00c (e590200c) ---[ end trace c94b2f4f38dff3bf ]--- This happens because the necessary resources were not yet allocated - no call to pl330_alloc_chan_resources(). Terminate the thread and free channel resource only if channel thread is not NULL. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: <stable@vger.kernel.org> Fixes: 0b94c57 ("DMA: PL330: Add check if device tree compatible") Reviewed-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Vinod Koul <vinod.koul@intel.com>

Fix a NULL pointer dereference after unbinding the driver, if channel resources were not yet allocated (no call to pl330_alloc_chan_resources()): $ echo 12850000.mdma > /sys/bus/amba/drivers/dma-pl330/unbind [ 13.606533] DMA pl330_control: removing pch: eeab6800, chan: eeab6814, thread: (null) [ 13.614472] Unable to handle kernel NULL pointer dereference at virtual address 0000000c [ 13.622537] pgd = ee284000 [ 13.625228] [0000000c] *pgd=6e1e4831, *pte=00000000, *ppte=00000000 [ 13.631482] Internal error: Oops: 17 [#1] PREEMPT SMP ARM [ 13.636859] Modules linked in: [ 13.639903] CPU: 0 PID: 1 Comm: sh Not tainted 3.17.0-rc3-next-20140904-00004-g7020ffc33ca3-dirty torvalds#420 [ 13.649187] task: ee80a800 ti: ee888000 task.ti: ee888000 [ 13.654589] PC is at _stop+0x8/0x2c8 [ 13.658131] LR is at pl330_control+0x70/0x2e8 [ 13.662468] pc : [<c0206028>] lr : [<c020649c>] psr: 60000093 [ 13.662468] sp : ee889e58 ip : 00000001 fp : 000bab70 [ 13.673922] r10: eeab6814 r9 : ee16debc r8 : 00000000 [ 13.679131] r7 : eeab685c r6 : 60000013 r5 : ee16de10 r4 : eeab6800 [ 13.685641] r3 : 00000002 r2 : 00000000 r1 : 00010000 r0 : 00000000 [ 13.692153] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user [ 13.699357] Control: 10c5387d Table: 6e28404a DAC: 00000015 [ 13.705085] Process sh (pid: 1, stack limit = 0xee888240) [ 13.710466] Stack: (0xee889e58 to 0xee88a000) [ 13.714808] 9e40: 00000002 eeab6800 [ 13.722969] 9e60: ee16de10 eeab6800 ee16de10 60000013 eeab685c c020649c 00000000 c040280c [ 13.731128] 9e80: ee889e80 ee889e80 ee16de18 ee16de10 eeab6880 eeab6814 00200200 eeab68a8 [ 13.739287] 9ea0: 00100100 c0208048 00000000 c0409fc4 eea80800 eea808f8 c0605c44 0000000e [ 13.747446] 9ec0: 0000000e eeb3960c eeb39600 c0203c48 eea80800 c0605c44 c0605a8c c023f694 [ 13.755605] 9ee0: ee80a800 eea80834 eea80800 c023f704 ee80a800 eea80800 c0605c44 c023e8ec [ 13.763764] 9f00: 0000000e ee149780 ee29e580 ee889f80 ee29e580 c023e19c 0000000e c01167e4 [ 13.771923] 9f20: c01167a0 00000000 00000000 c0115e88 00000000 00000000 ee0b1a00 0000000e [ 13.780082] 9f40: b6f48000 ee889f80 0000000e ee888000 b6f48000 c00bfadc 00000000 00000003 [ 13.788241] 9f60: 00000000 00000000 00000000 ee0b1a00 ee0b1a00 0000000e b6f48000 c00bfdf4 [ 13.796401] 9f80: 00000000 00000000 ffffffff 0000000e b6f48000 b6edc5d0 00000004 c000e7a4 [ 13.804560] 9fa0: 00000000 c000e620 0000000e b6f48000 00000001 b6f48000 0000000e 00000000 [ 13.812719] 9fc0: 0000000e b6f48000 b6edc5d0 00000004 0000000e b6f4c8c0 000c3470 000bab70 [ 13.820879] 9fe0: 00000000 bed2aa50 b6e18bdc b6e6b52c 60000010 00000001 c0c0c0c0 c0c0c0c0 [ 13.829058] [<c0206028>] (_stop) from [<c020649c>] (pl330_control+0x70/0x2e8) [ 13.836165] [<c020649c>] (pl330_control) from [<c0208048>] (pl330_remove+0xb0/0xdc) [ 13.843800] [<c0208048>] (pl330_remove) from [<c0203c48>] (amba_remove+0x24/0xc0) [ 13.851272] [<c0203c48>] (amba_remove) from [<c023f694>] (__device_release_driver+0x70/0xc4) [ 13.859685] [<c023f694>] (__device_release_driver) from [<c023f704>] (device_release_driver+0x1c/0x28) [ 13.868971] [<c023f704>] (device_release_driver) from [<c023e8ec>] (unbind_store+0x58/0x90) [ 13.877303] [<c023e8ec>] (unbind_store) from [<c023e19c>] (drv_attr_store+0x20/0x2c) [ 13.885036] [<c023e19c>] (drv_attr_store) from [<c01167e4>] (sysfs_kf_write+0x44/0x48) [ 13.892928] [<c01167e4>] (sysfs_kf_write) from [<c0115e88>] (kernfs_fop_write+0xc0/0x17c) [ 13.901090] [<c0115e88>] (kernfs_fop_write) from [<c00bfadc>] (vfs_write+0xa0/0x1a8) [ 13.908812] [<c00bfadc>] (vfs_write) from [<c00bfdf4>] (SyS_write+0x40/0x8c) [ 13.915850] [<c00bfdf4>] (SyS_write) from [<c000e620>] (ret_fast_syscall+0x0/0x30) [ 13.923392] Code: e5813010 e12fff1e e92d40f0 e24dd00c (e590200c) [ 13.929467] ---[ end trace 10064e15a5929cf8 ]--- Terminate the thread and free channel resource only if channel resources were allocated (thread is not NULL). Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: <stable@vger.kernel.org> Fixes: b3040e4 ("DMA: PL330: Add dma api driver") Reviewed-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Vinod Koul <vinod.koul@intel.com>

davejiang and others added 30 commits October 17, 2014 07:08

ntb: move platform detection to separate function

b775e85

Move the platform detection function to separate functions to allow easier maintenence. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: conslidate reading of PPD to move platform detection earlier

1db97f2

To simplify some of the platform detection code. Move the platform detection to a function to be called earlier. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>

s390: wire up bpf syscall

fcb1c2d

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

s390/uprobes: fix kprobes dependency

d6fe5be

If kprobes is disabled uprobes will not compile. Fix this by including the correct header files. Signed-off-by: Jan Willeke <willeke@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[CIFS] Remove obsolete comment

ff273cb

Signed-off-by: Steven French <smfrench@gmail.com>

ralfbaechle and others added 21 commits October 24, 2014 13:27

MIPS: SEAD3: Fix I2C device registration.

4846f11

This isn't a module and shouldn't be one. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

wutiejun pushed a commit that referenced this pull request Oct 25, 2014

Merge pull request #1 from torvalds/master

5a0e3eb

同步torvalds的内核修改。

wutiejun merged commit 5a0e3eb into wutiejun:master Oct 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

同步torvalds的内核修改。 #1

同步torvalds的内核修改。 #1

wutiejun commented Oct 25, 2014

同步torvalds的内核修改。 #1

同步torvalds的内核修改。 #1

Conversation

wutiejun commented Oct 25, 2014