Multipathing

Red Had documentation
SUSE documentation

Troubleshooting: Ubuntu SUSE SUSE


explanation explanation

sg_rtpg  /dev/sdx

Perform on each cluster node:

Install:

RHEL/Fedora:

dnf install -y device-mapper-multipath

SUSE/openSUSE

zypper install -y multipath-tools yast2-multipath

Assuming:
repeat similar procedure for qnap1.

Discover the iSCSI targets on a specific host

iscsiadm -m discovery -t sendtargets -p qnap1:3260 \
    --name discovery.sendtargets.auth.authmethod --value CHAP \
    --name discovery.sendtargets.auth.username --value sergey \
    --name discovery.sendtargets.auth.password --value abc123abc123 
 

Check the available iSCSI nodes and their known targets.

iscsiadm -m node

Delete node(s) you don’t want to connect to when the service is on with the following command:

iscsiadm -m node --op delete --targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c"  -p qnap1:3260
iscsiadm -m node --op delete --targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c"  -p qnap1:3260

Configure authentication for the remaining targets:

iscsiadm   --mode node  --targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c"  -p qnap1:3260 --op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm   --mode node  --targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c"  -p qnap1:3260 --op=update --name node.session.auth.username --value=sergey
iscsiadm   --mode node  --targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c"  -p qnap1:3260 --op=update --name node.session.auth.password --value=abc123abc123
iscsiadm   --mode node  --targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c"  -p qnap1:3260 --login

Identify devices (WWID), on first cluster node only:

lsscsi
lsscsi -s
lsscsi -dg
lsscsi -c
lsscsi -Lvl

iscsiadm -m session [-P 3] [-o show]

/usr/lib/udev/scsi_id  --whitelisted --device=/dev/sdx

# list vendor/model
for i in `ls /dev/sd[a-z]` ; do j=`basename $i`; echo $i "
[`cat /sys/block/$j/device/vendor`]"; echo $i "[`cat /sys/block/$j/device/model`]"; done

# list wwid's
for i in `ls /dev/sd[a-z]` ; do echo $i "`/usr/lib/udev/scsi_id --whitelisted --device=$i`"; done | sort -k2

# list vendor, model, revision, wwid/wwn in all formats etc.
for i in `ls /dev/sd[a-z]` ; do echo $i; /usr/lib/udev/scsi_id --whitelisted --device=$i --export; echo "--------------"; done

# list wwid in /dev/disk/by-id/wwid-xxxxxx format
for i in `ls /dev/sd[a-z]` ; do echo "$i `/usr/lib/udev/scsi_id --whitelisted --device=$i --export | grep ID_WWN_WITH_EXTENSION`"; done

# look for wwn under "designator type: NAA"
for i in `ls /dev/sd[a-z]` ; do echo $i; sg_vpd -i $i; echo "--------------"; done

To correlate /dev/sdx name to wwid to target-id

# map /dev/sdx to SCSI channel number
lsscsi -s

# map target-id to SCSI channel number
iscsiadm -m session -P 3

# map /dev/sdx to wwid
for i in `ls /dev/sd[a-z]` ; do echo $i "`/usr/lib/udev/scsi_id --whitelisted --device=$i`"; done | sort -k2

To view target portal groups:

sg_rtpg /dev/sdx

Create /etc/multipath.conf

mpathconf --enable

Edit /etc/multipath.conf (on cluster first node, then can replicate it to other nodes):

https://help.ubuntu.com/lts/serverguide/multipath-dm-multipath-config-file.html

To see defaults:

multipath -t
multipathd show config

Blacklist devices not to be multipathed

blacklist {
        # by wwid
        wwid  36000d310000065000000000000000020
        wwid  36000d310000065000000000000000021

        # by device names
        devnode "^sd[a-c]"
        # devnode "*"

        # by device type
        device {
               vendor  "IBM"
               product "S/390.*"
        }
        device {
               vendor  "HP"
               product "*"
        }
}

By default, the following devices are automatically blacklisted:
 devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"

Define blacklist exceptions:

blacklist_exceptions {
        # by wwid
        wwid  36000d310000065000000000000000030
        wwid  36000d310000065000000000000000031

}


Inheritance chain:  multipaths (by wwid) -> devices (by vendor/product) -> defaults.

Define defaults:

defaults {
        ## if yes, use
/etc/multipath/bindings to assign mpathn names
       ## if no, use wwid for multipath (default)
       ##
       ## wwid device names: /dev/disk/by-id/scsi-xxxxxxxx
       ##
       ## user-friendly device names: /dev/mapper/mpathN
       ##
       ##     unique and persistent through /etc/multipath/bidnings or /var/lib/multipath/bidnings
       ##     that stores and tracks wwn <-> N mapping
       ##
       ##     can be problematic for file systems containing root and /etc, or /var;
       ##     in the latter case can modify bindings file location with bindings_file
       ##
       ## alias names: /dev/mapper/<alias>
       ##
       ##     can be problematic for file systems containing root and /etc
       ##
       ## in cluster use wwid names or alias names, since user-friendly names are
       ## node-local and not cluster-global (unless bindings file is manually
       ## replicated)
       ##
        # user_friendly_names no
        # bindings_file "/etc/multipath/bindings"

        # if yes, create multipaths only if there are 2+ paths to the same wwid
        # if no, create
multipaths to all non-blacklisted devices
        find_multipaths yes

        ## directory where udev creates its device nodes
        # udev_dir /dev

        ## directory for plugins
        # multipath_dir "/lib/multipath"

        ## verbosity level (0 to 6, default = 2)
        # verbosity 2

        ## interval between two path checks (seconds, default: 5)
        # polling_interval 5

        ## specify the timeout for path checkers, to detect the path has failed,
        ## this is a timeout for SCSI command
        ## default is taken from /sys/block/sdx/device/timeout
        # checker_timeout 30 # in seconds

        ## FibreChannel specific timeout values
        # fast_io_fail_tmo ...
        # dev_loss_tmo ...

        ## method to determine the path's state
        # path_checker readsector0   # read the first sector of the device (default)
        # path_checker directio      # read the first sector with direct I/O
        # path_checker tur           # issue SCSI command TEST UNIT READY to the device,
                                     # preferred compared to readsector or directio if the LUN supports tur,
                                     # as on failure it does not fill up the system log with messages
                                     #
                                     #    to verify TUR works fine, can issue manually:
                                     #        sg_turs -v /dev/sdx
                                     #
        # path_checker emc_clarion   # query EMC Clarion specific EVPD page 0xC0 to determine path state
        # path_checker hp_sw         # query HP storage array firmware
        # path_checker rdac          # RDAC LSI/Engenio storage controller proprietory


        ## If set to "once" , multipathd logs the first path checker error at logging level 2,
        ## any later errors are logged at level 3 until the device is restored.
        ## If set to "always" , multipathd always logs the path checker error at logging level 2.
        ## Default is "always".
        # log_checker_err "always"

        ## path selector algorithm to use within a primary ptjgroup
        # path_selector "round-robin 0"      # spread load equally among all paths in the pathgroup
        # path_selector "queue-length 0"     # send next request batch to the path with the least amount of outstanding IO
        # path_selector "service-time 0"     # choose path for the next request batch based on the amount of outstanding IO
                                             # to the path and its relative throughput

        ## path grouping policy:
        ##
        ##     multibus            - all paths in the same priority group, traffic is load-balanced across
        ##                           all active paths in the group
        ##     failover            - separate priority group per each path, so only one path is used at a time
        ##     group_by_serial     - separate priority group per detected serial number (WWID)
        ##     group_by_node_name  - separate priority group per target node name (sys/class/fc_transport/target*/node_name)
        ##     group_by_prio       - separate priority group per each priority value; priorities are determined by
        ##                           callout program
specified in the config file on a per-multipath, per-controller
        ##                           or global basis
(parameter prio_callout)
        ##        
        ## A priority group is a set of paths that go to the same physical LUN.
        ## By default, I/O is distributed in a round-robin fashion across all paths in the group,
        ## unless alternatove path_selector is specified.
        ## The multipath framework multiplies the number of paths in a group by the group’s priority to determine
        ## which group is the primary. The group with the highest calculated value is the primary.
        ## When all paths in the primary group are failed, the priority group with the next highest value becomes active.       
        ##        
        # path_grouping_policy "multibus"

        ## Callout program and args used to obtain path priority values,
        ## when path_grouping_policy is "group_by_prio".
        ##
        ## Implemented for EMC, NetApp, Compaq/HP, Hitachi and other vendors, as well as for SCSI-3 ALUA.
        ##
        ## The specified program will be executed and should return a numeric value specifying
        ## the relative priority of this path. Higher number have a higher priority.
        ## A ’%n’ in the command line will be expanded to the device name, a ’%b’ will be expanded
        ## to the device number in major:minor format.
        ##
        ## Default is "none".
        ##
        # prio_callout none
        # prio_callout mpath_prio_alua /dev/%n

        ## default program and args to obtain a unique path identifier
        # getuid_callout "/lib/udev/scsi_id -g -u -s"
        # getuid_callout "/lib/udev/scsi_id --replace-whitespace --whitelisted --device=/dev/%n"

        ## device-mapper features to be used
        ## format is "number_of_features_plus_arguments feature1 ..."
        # features 
"1 queue_if_no_path"
       ## "queue_if_no_path" is equaivalent to setting no_path_retry to "queue" - i.e. if all paths
       ## are broke, all processes waiting for IO will hang until one or more paths are restored
       ##
       ## to disable queuing, set features to "0" and set no_path_retry to let us say 50
       ## see more in here
       ##
       ## pg_init_retries <n>      Retry path group initialization up to n times before failing
       ##                          where 1 <= n <= 50
       ##
       ## pg_init_delay_msecs <n>  Wait n milliseconds between path group initialization retries
       ##                          where 0 <= n <= 60000
       ##
       # features "4 pg_init_delay_msecs 1000 pg_init_retries 30"

        ## number of times to retry the path until multipath fails the path and disables queueing to it
        ##
        # no_path_retry 5      # retry specified number of times
        # no_path_retry 0      # (default) fail over right away
        # no_path_retry fail   # immediate failure (no queueing), same as "0"
        # no_path_retry queue  # never stop queueing - keep queueing forever until the path comes alive

        ## only for kernels < 2.6.31, newer ones use rr_min_io_rq
        ## number of IO's to route to a paths before switching to next path in the same group (default: 1000)
        # rr_min_io 1000

        ## number of IO's to route to a paths before switching to next path in the same group (default: 1)
        # rr_min_io_rq 1
       
        ## if set to "priorities", configurator will assign path weights as "path prio * rr_min_io_rq"
        ## if set to "uniform", all path weights are equal
        # rr_weight "priorities"
        # rr_weight "uniform"

        ## If set to "yes", multipath will try to detect if the device supports ALUA.
        ## If so, the device will automatically use the ALUA prioritizer.
        ## If not, the prioritizer will be selected as usual.
        ## Default is "no".
        ##         
        # detect_prio "no"

        ## default method to set priority to the paths
        ##         
        # prio const        # set priority 1 to all paths (default)
        # prio alua         # generate path priority based on SCSI-3 ALUA settings
        # prio tpr_pref     # generate path priority based on SCSI-3 ALUA settings, using the preferred port bit
        # prio emc          # for EMC arrays
        # prio ontap        # for NetApp arrays
        # prio rdac         # for LSI/Engenio/NetApp E-series RDAC controller
        # prio hp_sw        # for HP/Compaq controller in active/standby mode
        # prio hds          # for Hitachi arrays

        ## Specifies whether to monitor the failed path recovery, and indicates the timing for group failback
        ## after failed paths return to service.
        ## When the failed path recovers, the path is added back into the multipath-enabled path list based on
        ## this setting. Multipath evaluates the priority groups, and changes the active priority group when
        ## the priority of the primary path exceeds the secondary group.
        ##         
        # failback immediate    # When a path recovers, enable the path immediately.
        # failback 5            # When the path recovers, wait n seconds before enabling the path.
                                # Specify an integer value greater than 0.
        # failback manual       # (Default) The failed path is not monitored for recovery.
                                # The administrator runs the multipath command to update enabled paths and priority groups.
        # failback followover   # only perform automatic failback when the first path of a pathgroup becomes active,
                                # this keeps a node from automatically failing back when another node requested
                                # the failover

        ## if set to "no", multipathd will disable queueing for all devices when it is shut down (default: "yes")
        # queue_without_daemon "yes"

        ## if set to "yes", multipathd will disable queueing for all devices when the last path
        ## to the device has been deleted (default: "no")
        # flush_on_last_del "yes"

        ## Service action reservation key used by mpathpersist.
        ## Must be set for all multipath devices using persistent reservations.
        ## Must be the same as the RESERVATION KEY field of the PERSISTENT RESERVE OUT parameter list
        ## which contains an 8-byte value provided by the application client to the device server
        ## to identify the I_T nexus.
        ## Default: unset.
        ##
        # reservation_key ...

        ## udev attribute that provides a unique path identifier (default: ID_SERIAL)
        # uid_attribute "ID_SERIAL"
}

Define multipath attributes for each specific multipath device.
Specify alias and override defaults specified in defaults and devices, based on wwid.

Can override here

path_grouping_policy
path_selector
failback
prio
prio_args
no_path_retry
rr_min_io
rr_weight
flush_on_last_del


multipaths {
        multipath {
   
            wwid                  3600508b4000156d700012000000b0000
               
                # alias: optional, symbolic name for multipath map

                alias                 data_1

           
    path_grouping_policy  multibus
                path_checker          readsector0
    
            path_selector         "round-robin 0"
       
        failback              manual
           
    rr_weight             priorities
                no_path_retry         5
        }
        multipath {
    
            wwid                  1DEC_____321816758474
       
        alias                 data_2
                rr_weight             priorities
        }
}


Define devices.
Specify settings overriding the defaults based on per-vendor/product/revision basis.

Vendor: 8 characters.
Product: 16 characters.
Revision: 4 characters.
Trailing spaces are important!

Can override here:

path_grouping_policy
path_selector
failback
prio
prio_args
no_path_retry
rr_min_io
rr_weight
flush_on_last_del

getuid_callout
path_checker
features
fast_io_fail_tmo
dev_loss_tmo

devices {
        device {
                vendor                "COMPAQ  "
                product               "HSV110 (C)COMPAQ"
                #revision             "1530"
                path_grouping_policy  multibus
                path_checker          readsector0
                path_selector         "round-robin 0"
                failback              15
                rr_weight             priorities
                no_path_retry         queue

                # Module to perform hardware-specific actions
                # when switching paths or handling IO errors.
                #
                #     1 emc =>  for EMC storrage arrays
                #     1 alua => for SCSI-3 ALUA arrays
                #     1 hp_sw => for Compaq/HP controllers
                #     1 rdac => for LSI/Engenio RDAC controllers
                #
                hardware_handler      "1 hp_sw"
        }
        device {
                vendor                "COMPAQ  "
                product               "MSA1000         "
                path_grouping_policy  multibus
        }

        # specify which products of specific vendor to blacklist
        device {
                vendor                "Emulex"
                product_blacklist     "XYZ1*"
        }
}

Example /etc/multipath.conf:

blacklist {
        wwid "*"
}

blacklist_exceptions {
        wwid 
36e843b668f682e2d00a0d4740d912ada
}

defaults {
        # verbosity 2
        # bindings_file "/etc/multipath/bindings"
        user_friendly_names  no
        find_multipaths yes
        polling_interval 5
        rr_min_io_rq 1
        # getuid_callout "/lib/udev/scsi_id --replace-whitespace --whitelisted --device=/dev/%n"
        # queue_without_daemon yes
        # flush_on_last_del "yes"
}

devices {
        device {
                vendor "QNAP +"
                product "iSCSI Storage +"
                path_grouping_policy multibus
                path_selector "service-time 0"
                # path_checker readsector0   # read the first sector of the device (default)
                # path_checker directio      # read the first sector with direct I/O
                path_checker tur           # issue SCSI command TEST UNIT READY to the device
                failback 5
                prio const
                rr_weight uniform
                features "0"
                no_path_retry 0
        }
}

multipaths {
        multipath {
                wwid    36e843b668f682e2d00a0d4740d912ada
                alias   xs2
        }
}

After editing multipath.conf, changes are not automatically committed yet.
To display what a setup would look like (dry run), execute

multipath -v3 -d (displayed paths are grouped into priority groups)

Start the multipath daemon

# modprobe dm-multipath
systemctl restart multipathd.service

Verify multipath volumes are configured properly:

multipath -ll
multipath -ll
 -v 3
multipathd show config

multipathd show paths
multipathd show maps
multipathd show maps status
multipathd show maps stats
multipathd show topology
multipathd show devices

multipathd show config
multipathd show blacklist

multipathd reconfigure

dmsetup ls --tree
dmsetup info
dmsetup status
dmsetup table

To reconfigure (if neded):

systemctl stop multipathd.service

multipath -F                   => flush all unused maps
multipath -r [-v3]             => reload

multipath -f /dev/mapper/xxx   => flush if unused

systemctl restart multipathd.service

Optionall test disk access

hdparm -tT /dev/mapper/xs2

Fdisk cannot be used with /dev/mapper/xxx devices. Use fdisk on the underlying disk and then execute kpartx command to recognize the partition:

fdisk /dev/sdx
kpartx -avs /dev/mapper/xs2   => creates /dev/mapper/xs2p1

Enable multipath daemon to be started on boot

systemctl enable multipathd.service

Replicate /etc/multipath.conf to other nodes in the cluster:

rsync -av /etc/multipath.conf vc2:/etc
rsync -av /etc/multipath.conf vc3:/etc

Configure LVM to ignore underlying single-path devices.
Otherwise LVM will detect multiple copies of the same physical volume.
Edit /etc/lvm/lvm.conf:

On newer versions of LVM, ensure

in lvm.conf:  devices/multipath_component_detection=1
verify: lvm dumpconfig | grep multipath_component_detection

On older versions, follow guidelines here to set device filter in lvm.conf:  SUSE  RHEL (ch 2.4) UCONN NOVELL NOVELL

Repeat the procedure for other nodes in the cluster.

Make sure the cluster is started

pcs cluster status
corosync-quorumtool -oi -i

if not, then start it:  cluster start --all

Create LVM volume

pvcreate /dev/mapper/xs2p1
vgcreate --clustered y vg2 /dev/mapper/xs2p1
lvcreate vg2 --name lv1 --extents 100%FREE

dmsetup ls --tree

vgdisplay vg2
vgs
lvdisplay vg2

Proceed with creating a file system on /dev/vg2/lv1



To remove a path from multi-pathed storage device, execute on the underlying sdx path:

echo offline > /sys/block/sdx/device/state
echo 1 > /sys/block/sdx/device/delete


or using corresponding hba:ch:trg:lun

echo offline > /sys/class/scsi_device/hba:ch:trg:lun/device/state
echo 1 > /sys/class/scsi_device/hba:ch:trg:lun/device/delete

Adding a storage device or path:

see here (25.10, 25.11)

after adding a path device, execute:

multipath -r
multipath -ll