[Paranoia] Parallel ripping causes extreme slowdowns
Douglas Gilbert
dgilbert at interlog.com
Mon Dec 16 19:21:55 UTC 2019
On 2019-12-16 2:36 a.m., Merlijn B.W. Wajer wrote:
> Hi,
>
> This was indeed the problem. The problem was not related to the
> USB<->SATA bridges, but indeed in the sr driver itself, see inline.
>
> On 16/12/2019 07:21, Merlijn B.W. Wajer wrote:
>> Hi,
>>
>> Bingo, I think.
>>
>> On 16/12/2019 04:54, Merlijn B.W. Wajer wrote:
>>>
>>> Using `scsi_logging_level -s H 2 -M 2` I've been able to see how
>>> commands are submitted and executed, and what surprises me is that it
>>> seems like there is never an instance where commands to both the drives
>>> (or more than two, really) are submitted at the same time. That is, it
>>> seems like Linux always waits for one command to complete before
>>> submitting another one, even though the device is on a different SCSI
>>> host [3]. This makes no sense to me. Maybe SCSI logging output somehow
>>> confuses me, but I don't think so, as what I am observing would explain
>>> why a RESET (takes a while) in particular slows down all the drives - no
>>> other commands are submitted during that time.
>>>
>>> From looking at the source in drivers/usb/storage - there is nothing
>>> that seems to utilise some global lock (only per device or per host), so
>>> it's unclear to me why this would be happening - but I probably don't
>>> know enough about the kernel block/io schedulers. This [4] is a typical
>>> kernel stack for cdparanoia, btw.
>>
>> This commit introduces a sr global mutex, when removing the BKL:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/scsi/sr.c?id=2a48fc0ab24241755dc93bfd4f01d68efab47f5a
>>
>> A part of this code is still in place today, in sr_block_ioctl (see
>> below callstack)
>>
>>> [4] # cat /proc/6380/stack
>>> [<0>] blk_execute_rq+0x75/0xb0
>>> [<0>] sg_io+0x193/0x400
>>> [<0>] scsi_cmd_ioctl+0x260/0x440
>>> [<0>] scsi_cmd_blk_ioctl+0x51/0x70
>>> [<0>] cdrom_ioctl+0x41/0x124a
>>> [<0>] sr_block_ioctl+0xa5/0xd0
>>> [<0>] blkdev_ioctl+0x926/0xa00
>>> [<0>] block_ioctl+0x3d/0x50
>>> [<0>] do_vfs_ioctl+0xa9/0x640
>>> [<0>] ksys_ioctl+0x75/0x80
>>> [<0>] __x64_sys_ioctl+0x1a/0x20
>>> [<0>] do_syscall_64+0x5a/0x120
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>
>> I guess next steps are to see how this can be turned into a per device
>> mutex.
>
> With a simple change to drivers/scsi/sr.c and drivers/scsi/sr.h
> (introducing a per-device lock to struct scsi_cd, replacing sr_mutex
> with a per device lock) - the problem no longer occurs and reads from
> different sr devices happily interleave:
>
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Send: scmd 0x0000000000000000
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 1f 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 1f 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 124 sectors total, 63504 bytes done.
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 CDB: Volume set (in), Read cd be 00 00 00 0e a5 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 124 sectors total, 63504 bytes done.
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Send: scmd 0x0000000000000000
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 3a 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 Send: scmd 0x0000000000000000
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 CDB: Volume set (in), Read cd be 00 00 00 0e c0 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 3a 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 124 sectors total, 63504 bytes done.
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Send: scmd 0x0000000000000000
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 55 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 CDB: Volume set (in), Read cd be 00 00 00 0e c0 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 124 sectors total, 63504 bytes done.
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 Send: scmd 0x0000000000000000
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 7:0:0:0: [sr1] tag#0 CDB: Volume set (in), Read cd be 00 00 00 0e db 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 55 00 00 1b f8 00 00
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 124 sectors total, 63504 bytes done.
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 Send: scmd 0x0000000000000000
>> Dec 16 08:20:01 gentoo-x230 kernel: sr 6:0:0:0: [sr0] tag#0 CDB: Volume set (in), Read cd be 00 00 00 10 70 00 00 1b f8 00 00
> Will see if I can turn this into a patch that can be accepted on
> mainline Linux... Hope this was useful and no too spammy!
Hi,
Perhaps I'm a bit biased, but have you thought of using a sg device
rather than a sr device? lsscsi will show you the mapping between the
two.
The original USB Mass Storage protocol is !@#$. Avoid it at all costs.
It collapses all SCSI error code (often spread across 18 bytes) into
a single bit: it worked or it didn't.
UAS(P) is the way to go, if you can get it. I have a USB-C to SATA dongle
here that does UAS:
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
|__ Port 3: Dev 8, If 0, Class=Mass Storage, Driver=uas, 5000M
|__ Port 6: Dev 6, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 4: Dev 7, If 0, Class=Hub, Driver=hub/3p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 6: Dev 7, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 4: Dev 8, If 0, Class=Hub, Driver=hub/3p, 480M
|__ Port 3: Dev 9, If 0, Class=Vendor Specific Class,
Driver=ti_usb_3410_5052, 12M
|__ Port 7: Dev 3, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 7: Dev 3, If 1, Class=Wireless, Driver=btusb, 12M
|__ Port 9: Dev 5, If 0, Class=Vendor Specific Class, Driver=, 12M
Doug Gilbert
More information about the Paranoia
mailing list