
Changelog: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.6.76 Manually rebased: bcm27xx/patches-6.6/950-0519-usb-dwc3-Set-DMA-and-coherent-masks-early.patch imx/patches-6.6/600-PCI-imx6-Start-link-at-max-gen-first-for-IMX8MM-and-IMX8MP.patch Removed upstreamed: bcm27xx/patches-6.6/950-1446-media-i2c-ov9282-Correct-the-exposure-offset.patch[1] bcm47xx/patches-6.6/701-bgmac-reduce-max-frame-size-to-support-just-MTU-1500.patch[2] bcm53xx/patches-6.6/700-bgmac-reduce-max-frame-size-to-support-just-MTU-1500.patch[3] ramips/patches-6.6/003-v6.14-clk-ralink-mtmips-remove-duplicated-xtal-clock-for-Ralink.patch[4] All other patches automatically rebased. 1. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.76&id=11c7649c9ec3dcaf0a7760551ad30747d9e02d81 2, 3. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.76&id=5e6e723675e54ced5200bcc367e2526badc4070c 4. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.76&id=d0edcd0d18d700d76c61c091a24568b8b8c3b387 Build system: x86/64 Build-tested: bcm27xx/bcm2712, flogic/xiaomi_redmi-router-ax6000-ubootmod, ramips/tplink_archer-a6-v3 Run-tested: bcm27xx/bcm2712, flogic/xiaomi_redmi-router-ax6000-ubootmod, ramips/tplink_archer-a6-v3 Signed-off-by: John Audia <therealgraysky@proton.me> Link: https://github.com/openwrt/openwrt/pull/17822 (cherry picked from commit 84e370f16ca044896e6529d683067c17b565e6fa) Link: https://github.com/openwrt/openwrt/pull/17987 Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
51 lines
2.0 KiB
Diff
51 lines
2.0 KiB
Diff
From e9e852af347ae3ccee4e7abb01f9ef91387980f9 Mon Sep 17 00:00:00 2001
|
|
From: Jonathan Bell <jonathan@raspberrypi.com>
|
|
Date: Wed, 6 Nov 2024 11:07:55 +0000
|
|
Subject: [PATCH] drivers: usb: xhci: prevent a theoretical race on
|
|
non-coherent platforms
|
|
|
|
For platforms that have xHCI controllers attached over PCIe, and
|
|
non-coherent routes to main memory, a theoretical race exists between
|
|
posting new TRBs to a ring, and writing to the doorbell register.
|
|
|
|
In a contended system, write traffic from the CPU may be stalled before
|
|
the memory controller, whereas the CPU to Endpoint route is separate
|
|
and not likely to be contended. Similarly, the DMA route from the
|
|
endpoint to main memory may be separate and uncontended.
|
|
|
|
Therefore the xHCI can receive a doorbell write and find a stale view
|
|
of a transfer ring. In cases where only a single TRB is ping-ponged at
|
|
a time, this can cause the endpoint to not get polled at all.
|
|
|
|
Adding a readl() before the write forces a round-trip transaction
|
|
across PCIe, definitively serialising the CPU along the PCI
|
|
producer-consumer ordering rules.
|
|
|
|
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
|
|
---
|
|
drivers/usb/host/xhci-ring.c | 13 +++++++++++++
|
|
1 file changed, 13 insertions(+)
|
|
|
|
--- a/drivers/usb/host/xhci-ring.c
|
|
+++ b/drivers/usb/host/xhci-ring.c
|
|
@@ -507,6 +507,19 @@ void xhci_ring_ep_doorbell(struct xhci_h
|
|
|
|
trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));
|
|
|
|
+ /*
|
|
+ * For non-coherent systems with PCIe DMA (such as Pi 4, Pi 5) there
|
|
+ * is a theoretical race between the TRB write and barrier, which
|
|
+ * is reported complete as soon as the write leaves the CPU domain,
|
|
+ * the doorbell write, which may be reported as complete by the RC
|
|
+ * at some arbitrary point, and the visibility of new TRBs in system
|
|
+ * RAM by the endpoint DMA engine.
|
|
+ *
|
|
+ * This read before the write positively serialises the CPU state
|
|
+ * by incurring a round-trip across the link.
|
|
+ */
|
|
+ readl(db_addr);
|
|
+
|
|
writel(DB_VALUE(ep_index, stream_id), db_addr);
|
|
/* flush the write */
|
|
readl(db_addr);
|