Multipathing
- Devices are identified by WWID
- WWID = 3yyyyyyyyyyyyyyyyy
WWN = 0xyyyyyyyyyyyyyyy
Leading 3 in WWID comes from an identifier type number in SCSI VPD page 0x83: 3 = NAA.
- Setup
described below is based on paths within either different IP subnets or
with interface bonding, with packets routed by IP layer.
It is also possible to explicitly route SCSI packets in iSCSI initiator
layer by registering multiple interfaces with iscsiadm --mode iface.
Perform on each cluster node:
Install:
RHEL/Fedora:
dnf install -y
device-mapper-multipath
SUSE/openSUSE
zypper install -y multipath-tools yast2-multipath
Assuming:
- the target is published (dual-ported) on portals qnap1x:3260 and qnap1:3260
- qnap1x
is already connected as described in "Connecting iSCSI drives" above
repeat similar procedure for qnap1.
Discover the iSCSI targets on a specific host
iscsiadm -m
discovery -t
sendtargets -p qnap1:3260 \
--name discovery.sendtargets.auth.authmethod
--value CHAP \
--name discovery.sendtargets.auth.username
--value sergey \
--name discovery.sendtargets.auth.password
--value abc123abc123
Check the available iSCSI nodes and their known targets.
iscsiadm -m node
Delete node(s) you don’t want to connect to when the service is on with
the following command:
iscsiadm -m node --op delete
--targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs1.e4cd7c"
-p
qnap1:3260
iscsiadm -m node --op delete
--targetname "iqn.2004-04.com.qnap:ts-569l:iscsi.xs3.e4cd7c"
-p
qnap1:3260
Configure authentication for the remaining targets:
iscsiadm
--mode node --targetname
"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -p
qnap1:3260
--op=update --name node.session.auth.authmethod --value=CHAP
iscsiadm
--mode node --targetname
"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -p
qnap1:3260
--op=update --name node.session.auth.username --value=sergey
iscsiadm
--mode node --targetname
"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -p
qnap1:3260
--op=update --name node.session.auth.password --value=abc123abc123
iscsiadm
--mode node --targetname
"iqn.2004-04.com.qnap:ts-569l:iscsi.xs2.e4cd7c" -p
qnap1:3260 --login
Identify devices (WWID), on first cluster node only:
lsscsi
lsscsi
-s
lsscsi
-dg
lsscsi
-c
lsscsi
-Lvl
iscsiadm
-m session [-P 3] [-o show]
/usr/lib/udev/scsi_id
--whitelisted --device=/dev/sdx
#
list vendor/model
for i in `ls /dev/sd[a-z]` ; do j=`basename $i`; echo $i "[`cat
/sys/block/$j/device/vendor`]";
echo
$i "[`cat /sys/block/$j/device/model`]";
done
#
list wwid's
for i in `ls /dev/sd[a-z]` ;
do echo $i "`/usr/lib/udev/scsi_id --whitelisted
--device=$i`"; done | sort -k2
#
list vendor, model, revision, wwid/wwn in all formats etc.
for
i in `ls /dev/sd[a-z]` ; do echo
$i; /usr/lib/udev/scsi_id --whitelisted --device=$i
--export;
echo "--------------"; done
#
list wwid in /dev/disk/by-id/wwid-xxxxxx
format
for
i in `ls /dev/sd[a-z]` ; do echo "$i
`/usr/lib/udev/scsi_id --whitelisted --device=$i --export |
grep
ID_WWN_WITH_EXTENSION`"; done
#
look for wwn under "designator type: NAA"
for
i in `ls /dev/sd[a-z]` ; do echo $i; sg_vpd -i $i; echo
"--------------"; done
To correlate /dev/sdx
name to wwid
to target-id
# map
/dev/sdx to SCSI channel number
lsscsi -s
# map target-id to SCSI channel number
iscsiadm -m session
-P 3
#
map /dev/sdx
to wwid
for
i in `ls /dev/sd[a-z]` ; do echo $i
"`/usr/lib/udev/scsi_id --whitelisted --device=$i`"; done |
sort -k2
To view target portal groups:
sg_rtpg /dev/sdx
Create /etc/multipath.conf
mpathconf --enable
Edit /etc/multipath.conf (on cluster first node, then can replicate it
to other nodes):
To see defaults:
multipath -t
multipathd show
config
Blacklist devices not to
be multipathed
blacklist {
# by wwid
wwid
36000d310000065000000000000000020
wwid
36000d310000065000000000000000021
# by device names
devnode "^sd[a-c]"
# devnode "*"
# by device type
device {
vendor "IBM"
product "S/390.*"
}
device {
vendor "HP"
product "*"
}
}
By default, the following
devices are automatically blacklisted:
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
Define blacklist
exceptions:
blacklist_exceptions {
# by wwid
wwid
36000d310000065000000000000000030
wwid
36000d310000065000000000000000031
}
Inheritance chain:
multipaths
(by wwid) -> devices
(by vendor/product) -> defaults.
Define defaults:
defaults {
## if yes, use /etc/multipath/bindings to assign mpathn
names
## if no, use wwid for
multipath (default)
##
## wwid device names:
/dev/disk/by-id/scsi-
xxxxxxxx
##
## user-friendly device
names: /dev/mapper/mpathN
##
##
unique and persistent through /etc/multipath/bidnings or
/var/lib/multipath/bidnings
##
that stores and tracks wwn <-> N mapping
##
## can
be problematic for file systems containing root and /etc, or /var;
## in
the latter case can modify bindings file location with bindings_file
##
## alias names:
/dev/mapper/<alias>
##
## can
be problematic for file systems containing root and /etc
##
## in cluster use wwid names
or alias names, since user-friendly names are
## node-local and not
cluster-global (unless bindings file is manually
## replicated)
##
# user_friendly_names no
# bindings_file
"/etc/multipath/bindings"
# if yes, create multipaths
only if there are 2+ paths to the same wwid
# if no, create multipaths to all non-blacklisted devices
find_multipaths yes
## directory where udev
creates its device nodes
# udev_dir /dev
## directory for plugins
# multipath_dir
"/lib/multipath"
## verbosity level (0 to 6, default =
2)
# verbosity 2
## interval between two path checks
(seconds, default: 5)
# polling_interval 5
## specify the timeout for
path checkers, to detect the path has failed,
## this is a timeout for
SCSI command
## default is taken from
/sys/block/sdx/device/timeout
# checker_timeout 30 # in
seconds
## FibreChannel specific timeout values
# fast_io_fail_tmo ...
# dev_loss_tmo ...
## method to determine the path's state
# path_checker readsector0
# read the first sector of the device (default)
# path_checker directio
# read the first sector with direct I/O
# path_checker tur
# issue SCSI command TEST
UNIT READY to the device,
# preferred
compared to readsector or directio if the LUN supports tur,
# as on failure
it does not fill up the system log with messages
#
#
to verify TUR works fine, can issue manually:
#
sg_turs -v /dev/sdx
#
# path_checker emc_clarion
# query EMC
Clarion specific EVPD page 0xC0 to determine path state
# path_checker hp_sw
# query HP storage array firmware
# path_checker rdac
# RDAC LSI/Engenio storage controller proprietory
## If set to "once" ,
multipathd logs the first path checker error at logging level 2,
## any later errors are
logged at level 3 until the device is restored.
## If set to "always" ,
multipathd always logs the path checker error at logging level 2.
## Default is "always".
# log_checker_err "always"
## path selector algorithm to use within a primary ptjgroup
# path_selector "round-robin 0"
# spread load equally among all paths in the pathgroup
# path_selector "queue-length 0"
#
send next request batch to the path with the least amount of
outstanding IO
#
path_selector "service-time 0" # choose path for
the next
request batch based on the amount of outstanding IO
# to the path and its relative throughput
## path grouping policy:
##
## multibus
- all paths in the same priority group,
traffic is load-balanced across
##
all active paths in
the group
## failover
- separate priority group per each
path, so only one path is used at a time
##
group_by_serial - separate priority
group per detected serial number (WWID)
## group_by_node_name
- separate
priority group per target node name
(sys/class/fc_transport/target*/node_name)
## group_by_prio - separate priority group per each priority value;
priorities are determined by
##
callout program specified
in the
config file on a per-multipath, per-controller
##
or global basis (parameter
prio_callout)
##
## A priority group is a set of paths
that go to the same physical LUN.
## By default, I/O is
distributed in a round-robin fashion across all paths in the group,
## unless alternatove
path_selector is specified.
## The multipath framework multiplies
the number
of paths in a group by the group’s priority to determine
## which group is the
primary. The group with the highest calculated value is the primary.
## When
all paths in the primary group are failed, the priority group with the
next highest value becomes active.
##
# path_grouping_policy "multibus"
## Callout program and args used to obtain path priority values,
## when
path_grouping_policy is "group_by_prio".
##
## Implemented for EMC, NetApp, Compaq/HP,
Hitachi and other vendors, as well as for SCSI-3 ALUA.
##
## The specified program will be executed and
should return a numeric value specifying
## the relative priority of
this path. Higher number have a higher priority.
## A ’%n’ in the command
line will be expanded to the device name, a ’%b’ will be expanded
## to the device number in
major:minor format.
##
## Default is "none".
##
# prio_callout none
# prio_callout mpath_prio_alua /dev/%n
## default program and args
to obtain a unique path identifier
# getuid_callout
"/lib/udev/scsi_id -g -u -s"
# getuid_callout
"/lib/udev/scsi_id --replace-whitespace --whitelisted --device=/dev/%n"
## device-mapper features to be used
## format is
"number_of_features_plus_arguments feature1 ..."
# features "1
queue_if_no_path"
## "queue_if_no_path" is
equaivalent to setting no_path_retry to "queue" - i.e. if all paths
## are broke, all processes
waiting for IO will hang until one or more paths are restored
##
## to disable queuing, set
features to "0" and set no_path_retry to let us say 50
## see more in
here
##
## pg_init_retries <n>
Retry path group initialization up to n times before failing
##
where
1 <= n <= 50
##
## pg_init_delay_msecs
<n> Wait n milliseconds between path group
initialization retries
##
where
0 <= n <= 60000
##
# features "4
pg_init_delay_msecs 1000 pg_init_retries 30"
## number of times to retry the path until
multipath fails the path and disables queueing to it
##
# no_path_retry 5
# retry specified number of times
# no_path_retry 0
# (default) fail over right away
# no_path_retry fail
# immediate failure (no queueing), same as "0"
# no_path_retry queue # never stop
queueing - keep queueing forever until the path comes alive
## only for kernels < 2.6.31, newer ones use
rr_min_io_rq
## number of IO's to route to a paths
before
switching to next path in the same group (default: 1000)
# rr_min_io 1000
## number of IO's to route to a paths before
switching to next path in the same group (default: 1)
#
rr_min_io_rq 1
## if set to "priorities", configurator will
assign path weights as "path prio * rr_min_io_rq"
## if
set to "uniform", all path weights are equal
# rr_weight "priorities"
# rr_weight "uniform"
## If set to "yes",
multipath will try to detect if the device supports ALUA.
## If so, the device will
automatically use the ALUA prioritizer.
## If not, the prioritizer
will be selected as usual.
## Default is "no".
##
# detect_prio "no"
## default method to set priority to the paths
##
# prio const
# set priority 1 to all paths (default)
# prio alua
# generate path priority based on
SCSI-3 ALUA settings
# prio tpr_pref # generate path priority based on SCSI-3 ALUA
settings, using the preferred port bit
# prio emc
# for EMC arrays
# prio ontap
# for NetApp arrays
# prio rdac
# for LSI/Engenio/NetApp E-series RDAC controller
# prio hp_sw
# for HP/Compaq controller in active/standby mode
# prio hds
# for Hitachi arrays
## Specifies whether to monitor the
failed path
recovery, and indicates the timing for group failback
## after failed paths
return to service.
## When the failed path recovers, the
path is
added back into the multipath-enabled path list based on
## this setting. Multipath
evaluates the priority groups, and changes the active priority group
when
## the priority of the
primary path exceeds the secondary group.
##
# failback
immediate #
When a path recovers, enable the path immediately.
# failback 5
# When the path recovers, wait n seconds before enabling the
path.
# Specify an
integer value greater
than 0.
# failback manual
#
(Default) The failed path is not monitored for recovery.
# The
administrator runs the
multipath command to update enabled paths and priority groups.
# failback followover
# only perform
automatic failback when the first path of a pathgroup becomes active,
# this keeps a
node from
automatically failing back when another node requested
# the failover
## if set to "no", multipathd will
disable
queueing for all devices when it is shut down (default: "yes")
# queue_without_daemon "yes"
## if set to "yes", multipathd will disable
queueing for all devices when the last path
## to the
device has been deleted (default: "no")
# flush_on_last_del "yes"
## Service action
reservation key used by mpathpersist.
## Must be set for all multipath devices using
persistent reservations.
## Must be the same as the RESERVATION KEY field
of the PERSISTENT RESERVE OUT parameter list
## which contains an 8-byte value provided by the
application client to the device server
## to identify the I_T
nexus.
## Default: unset.
##
# reservation_key ...
## udev attribute that
provides a unique path identifier (default: ID_SERIAL)
# uid_attribute "ID_SERIAL"
}
Define multipath
attributes for each specific multipath device.
Specify alias and override defaults specified in
defaults and
devices, based on
wwid.
Can override here
path_grouping_policy
path_selector
failback
prio
prio_args
no_path_retry
rr_min_io
rr_weight
flush_on_last_del
multipaths {
multipath {
wwid
3600508b4000156d700012000000b0000
# alias: optional, symbolic name for multipath map
alias
data_1
path_grouping_policy
multibus
path_checker
readsector0
path_selector
"round-robin 0"
failback
manual
rr_weight
priorities
no_path_retry
5
}
multipath {
wwid
1DEC_____321816758474
alias
data_2
rr_weight
priorities
}
}
Define devices.
Specify settings overriding the defaults based on
per-vendor/product/revision basis.
Vendor: 8 characters.
Product: 16 characters.
Revision: 4 characters.
Trailing spaces are important!
Can override here:
path_grouping_policy
path_selector
failback
prio
prio_args
no_path_retry
rr_min_io
rr_weight
flush_on_last_del
getuid_callout
path_checker
features
fast_io_fail_tmo
dev_loss_tmo
devices {
device {
vendor
"COMPAQ "
product
"HSV110 (C)COMPAQ"
#revision
"1530"
path_grouping_policy multibus
path_checker
readsector0
path_selector
"round-robin 0"
failback
15
rr_weight
priorities
no_path_retry
queue
# Module to perform hardware-specific actions
# when switching paths or handling IO errors.
#
# 1 emc =>
for EMC storrage arrays
# 1 alua => for
SCSI-3 ALUA arrays
# 1 hp_sw => for
Compaq/HP controllers
# 1 rdac => for
LSI/Engenio RDAC controllers
#
hardware_handler
"1 hp_sw"
}
device {
vendor
"COMPAQ "
product
"MSA1000
"
path_grouping_policy
multibus
}
# specify which products of specific vendor to
blacklist
device
{
vendor
"Emulex"
product_blacklist "XYZ1*"
}
}
Example
/etc/multipath.conf:
blacklist {
wwid "*"
}
blacklist_exceptions {
wwid 36e843b668f682e2d00a0d4740d912ada
}
defaults {
# verbosity 2
# bindings_file "/etc/multipath/bindings"
user_friendly_names no
find_multipaths yes
polling_interval
5
rr_min_io_rq
1
# getuid_callout "/lib/udev/scsi_id --replace-whitespace --whitelisted
--device=/dev/%n"
# queue_without_daemon yes
# flush_on_last_del "yes"
}
devices {
device {
vendor "QNAP +"
product "iSCSI Storage +"
path_grouping_policy multibus
path_selector "service-time 0"
# path_checker
readsector0 #
read the first sector of the device (default)
# path_checker
directio # read the
first sector with
direct I/O
path_checker
tur
# issue
SCSI command TEST UNIT READY to the device
failback 5
prio const
rr_weight uniform
features "0"
no_path_retry 0
}
}
multipaths {
multipath {
wwid 36e843b668f682e2d00a0d4740d912ada
alias xs2
}
}
After editing multipath.conf, changes are not automatically committed
yet.
To display what a setup would look like (dry run), execute
multipath -v3 -d (displayed
paths are grouped into priority groups)
Start the multipath daemon
# modprobe dm-multipath
systemctl restart multipathd.service
Verify multipath volumes are configured properly:
multipath
-ll
multipath -ll -v 3
multipathd show config
multipathd show paths
multipathd show maps
multipathd show maps status
multipathd show maps stats
multipathd show topology
multipathd show devices
multipathd show config
multipathd show blacklist
multipathd reconfigure
dmsetup ls --tree
dmsetup info
dmsetup status
dmsetup table
To reconfigure (if neded):
systemctl
stop
multipathd.service
multipath -F
=> flush all unused maps
multipath -r [-v3]
=> reload
multipath -f /dev/mapper/xxx => flush if unused
systemctl restart
multipathd.service
Optionall test disk access
hdparm -tT /dev/mapper/xs2
Fdisk cannot be used with /dev/mapper/xxx devices. Use fdisk on the
underlying disk and then execute kpartx
command to recognize the partition:
fdisk /dev/sdx
kpartx -avs /dev/mapper/xs2 => creates
/dev/mapper/xs2p1
Enable multipath daemon to be started on boot
systemctl enable
multipathd.service
Replicate /etc/multipath.conf to other nodes in the cluster:
rsync -av
/etc/multipath.conf vc2:/etc
rsync
-av
/etc/multipath.conf vc3:/etc
Configure LVM to ignore underlying single-path devices.
Otherwise LVM will detect multiple copies of the same physical volume.
Edit /etc/lvm/lvm.conf:
On newer versions of LVM,
ensure
in lvm.conf:
devices/multipath_component_detection=1
verify: lvm dumpconfig | grep multipath_component_detection
Repeat the procedure for other nodes in the cluster.
Make sure the cluster is started
pcs cluster status
corosync-quorumtool
-oi -i
if not, then start it: cluster start --all
Create LVM volume
pvcreate /dev/mapper/xs2p1
vgcreate --clustered y vg2 /dev/mapper/xs2p1
lvcreate vg2 --name lv1 --extents 100%FREE
dmsetup ls --tree
vgdisplay vg2
vgs
lvdisplay vg2
Proceed with creating a file system on /dev/vg2/lv1
To remove a path from multi-pathed storage device, execute on the underlying sdx path:
echo offline > /sys/block/sdx/device/state
echo 1 > /sys/block/sdx/device/delete
or using corresponding hba:ch:trg:lun
echo offline > /sys/class/scsi_device/hba:ch:trg:lun/device/state
echo 1 > /sys/class/scsi_device/hba:ch:trg:lun/device/delete
Adding a storage device or path:
see here (25.10, 25.11)
after adding a path device, execute:
multipath -r
multipath -ll