Lotus test results based on ARM architecture

Test results of Kunpeng 920 chip

With the development power of China Huawei and China wheat field cloud two companies, based on the ARM architecture running lotus test situation.Kunpeng 920 Model 7265 chip in their own lotus bench testing tools test, spreCommit phase 1 on the test results on Kunpeng 35.78 s, with the AMD 7nm 7302 chip 35.2 test results basically hold this.

Test environment configuration

Hardware configuration CPU:2*Kunpeng 920,64Core@3.0GHz

DDR:16*32G 3200Mhz

hard disk: ES3510P V5 SSD -4 TB
F ilecoin software version 0.42
Testing tools lotus bench
Test Command lotus -bench sealing --sector-size= M --storage-dir=lotus -bench sealing --sector-size=TMP_DIR 512
Operating system Cent OS 7.6(kerel 5.3)
Dependency package G CC 9.2.0 and above

Compatibility Compiling Policy

Download Project

git clone https://github.com/filecoin-project/lotus.git cd lotus

git checkout -b mybuild v0.4.1# compile lotus v0.4.1

git checkout -b mybuild v0.4.2# compile lotus v0.4.2

git submodule update --init --recursive


cd extern/filecoin-ffi

git checkout -b mybuild v0.30.2# compile lotus v0.4.1

git checkout -b mybuild v0.30.3# compile lotus v0.4.2

git am 0001-patch-to-run-on-arm64.patch rm rust/Cargo.lock

Compilation lotus

cd …/…/


Compile bench tools

make bench # compile 0.4.1

make lotus-bench # compile 0.4.2

Compilation result detection

banana@e0481debdb2d:~/ lotusbanana@e0481debdb2d:~ls -ld lotusbench

rwxr-xr-x 1banana banana 112529533Jul 27- 11:47 bench

rwxr-xr-x 1banana banana 133671341Jul 27- 11:46 lotus

rwxr-xr-x 1banana banana 112638493Jul 27- 11:47 lotus-seal-worker

rwxr-xr-x 1banana banana 133347085Jul 27- 11:46 lotus-storage-miner

drwxrwxr-x 3banana banana 4096Jul 27 11.15 lotuspond

Optimization measures

System Optimization Strategy

(1) Mount tmp directory with tmpfs

(2) Nuclear binding

(3) BIOS on performance mode, prefetching, and memory refresh frequency adjusted to 64 ms

Software Optimization Strategy

Through the topdown analysis of grab precommit phase 1, the performance bottleneck is the effect of memory delay, the hit rate of L1 is about 98%, and the hit rate of optimized L 1 is 99.2%.

By changing the compress 256 function incoming parameters, directly incoming address, reduce the copy of the array.

When the blocks array is initialized, the business code calls the prefetch function in advance, and uses the embedded assembly to realize the Kunpeng software prefetch interface, increases the compress256_prefetch function, and improves the hit rate of the L 1.

Wondering what results will they get for 32GB sectorsize.