Linpack测试流程


1. 安装OpenBLAS

Linpack测试需要调用矩阵运算相关的数学库,可以选择的项目有BLAS、VSIPL。在这里我们选择安装具有龙芯向量优化的OpenBLAS

步骤1:下载OpenBLAS源码。 可以选择社区版本OpenBLAS或内部版本OpenBLAS。内部版本的OpenBLAS相比较社区版本默认开启大页面且使用更加高效的$16\times6$的dgemm计算内核,Linpack浮点峰值性能更高。以下步骤均使用内部OpenBLAS进行测试。

git clone "http://rd.loongson.cn:8081/media/openblas"
cd openblas
git checkout -b linpack remotes/origin/linpack

步骤2:编译安装

make NO_LAPACK=1 USE_SIMPLE_THREADED_LEVEL3=1 NO_AFFINITY=0
make install PREFIX=/opt/openblas

安装时选项PREFIX=/opt/openblas指定的安装目录会在编译HPL之前编辑Make文件时使用到。

步骤3:设置OpenBLAS环境变量 编辑/etc/profile加入export LD_LIBRARY_PATH=/opt/openblas/lib:$LD_LIBRARY_PATH,执行以下命令。

source /etc/profile

2. 安装mpich

步骤1:安装依赖 OpenMP我们选择源上的 mpich,安装依赖包括mpich,mpich-devel ,在loongnix-server系统上执行。

sudo yum install mpich mpich-devel

步骤2: 设置mpich环境变量 编辑/etc/profile加入export PATH=/usr/lib64/mpich/bin:$PATH,然后执行下面的命令使其生效。

source /etc/profile

可以通过命令rpm -ql mpich查看mpich的安装位置,后续在编译HPL之前编辑Make文件时也需要用到。

3. 编译HPL

编译HPL首先需要提供编译配置文件,以Make.arch_name命名,例如Make.Linux_la64。该文件主要用于配置HPL依赖的软件库和头文件位置,与前面安装mpichOpenBLAS时的指定的路径有关。将该文件放置在HPL的源码目录下,执行make arch=arch_name完成编译,例如make arch=Linux_la64

步骤1:解压HPL源码

cd /home/hpc
tar xf hpl-2.2.tar.gz
cd hpl-2.2

步骤2:创建并设置Make.Linux_la64配置文件 在HPL的源码目录下hpl-2.2/setup有多个架构的参考配置文件,可以基于此进行修改。龙芯平台可以参考如下配置文件。

# cat Make.Linux_la64

# ######################################################################
#
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -fs
MKDIR        = mkdir -p
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = Linux_la64
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir       = /home/yinsy/linpack/hpl-2.2 // TODO: 根据情况修改为HPL路径
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir = /usr/lib64/mpich
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.so
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /opt/openblas
LAinc = $(LAdir)/include
LAlib = $(LAdir)/lib/libopenblas.a
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the BLAS Fortran 77 interface,
#    *) not display detailed timing information.
#
HPL_OPTS     = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC       = mpicc
CCNOOPT  = $(HPL_DEFS)
CCFLAGS  = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -pthread -lm
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER       = mpif77
LINKFLAGS    = $(CCFLAGS) $(OMP_DEFS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------

步骤3:编译HPL可执行文件 执行以下命令进行编译。

# make arch=Linux_la64

编译完成之后,会在hpl-2.2/bin/Linux_la64目录下生成可执行文件xhpl和测试配置文件HPL.dat

4. 测试HPL

步骤1:修改HPL.dat配置文件 针对不同的硬件平台,要想测试出最佳的性能,需要配置合适的HPL.dat。下述参考配置中,针对比较重要的行进行了说明。

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
174500   Ns  # 根据内存大小设置 Ns,计算公式为:N^2=(0.9*memory_size(Bytes)/8)
1            # of NBs
304      NBs # 矩阵分块大小,跟硬件性能有关,建议配置为 cacheline/8 的整数倍
0            PMAP process mapping (0=Row-,1=Column-major)
2            # of process grids (P x Q), 就是有几种PXQ组合需要测试,这里是2组,分别是1x4 和 2x2。为避免线程跨NUMA节点通信造成性能下降,为每个NUMA节点设置一个进程,即PxQ=NUMA节点个数。推荐参数:3A5000设置为1x1;单路3C5000L设置为2x2,双路3C5000L设置为2x4;单路3C5000设置为1x1, 双路3C5000设置为1x2,四路3C5000设置为2x2; 单路3D5000设置为1x2,双路3D5000设置为2x2
1 2      Ps
4 2      Qs
16.0         threshold
1            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

步骤2: 设置OpenBLAS线程数 OPENBLAS_NUM_THREADS指定的是OpenBLAS单个进程开启的线程数。为避免线程跨NUMA节点通信造成性能下降,为每个NUMA节点分配一个进程,线程数为每个NUMA节点核数,在不同硬件平台设置如下。 3A50003C5000L

export OPENBLAS_NUM_THREADS=4

3C50003D5000

export OPENBLAS_NUM_THREADS=16

步骤3:设置系统可用大页面数 由于默认已开启大页面内存分配,需要将大页面数目设置为与Linpack线程数相匹配。不同硬件平台的设置如下: 3A5000

echo 4 > /proc/sys/vm/nr_hugepages

单路3C5000L

echo 16 > /proc/sys/vm/nr_hugepages

双路3C5000L

echo 32 > /proc/sys/vm/nr_hugepages

单路3C5000

echo 16 > /proc/sys/vm/nr_hugepages

双路3C5000

echo 32 > /proc/sys/vm/nr_hugepages

四路3C5000

echo 64 > /proc/sys/vm/nr_hugepages

单路3D5000

echo 32 > /proc/sys/vm/nr_hugepages

双路3D5000

echo 64 > /proc/sys/vm/nr_hugepages

步骤4: 执行测试

单进程时直接执行./xhpl; 多进程时使用命令mpirun来启动多进程的xhpl测试,进程数为NUMA节点个数。不同平台执行测试步骤如下。

3A5000

cd bin/Linux_la64
./xhpl

单路3C5000L

cd bin/Linux_la64
mpirun -np 4 ./xhpl

双路3C5000L

cd bin/Linux_la64
mpirun -np 8 ./xhpl

单路3C5000

cd bin/Linux_la64
./xhpl

双路3C5000

cd bin/Linux_la64
mpirun -np 2 ./xhpl

四路3C5000

cd bin/Linux_la64
mpirun -np 4 ./xhpl

单路3D5000

cd bin/Linux_la64
mpirun -np 2 ./xhpl

双路3D5000

cd bin/Linux_la64
mpirun -np 4 ./xhpl

©龙芯开源社区 all right reserved,powered by Gitbook文档更新时间: 2024-04-16 14:42:33

results matching ""

    No results matching ""

    results matching ""

      No results matching ""