-rw-r--r-- 4923 saferewrite-20260206/README-resources raw
Recommendations: (1) use a modern server with a reasonable amount of
RAM, (2) limit the number of threads that saferewrite will use (see the
THREADS mechanism in README), and (3) select the analyses that you're
interested in (see the chmod mechanism in README).
The following measurements were collected on rome2, a dual EPYC 7742
(128 cores overall), overclocking disabled (so CPUs were running at
2.245GHz), 512GB RAM, 1TB swap space, running Debian 12.
One run (before elfulator was installed) analyzed all 248 cryptoint
functions (not just the 248 cryptoint implementations of those functions
but also 391 other implementations):
chmod +t src/*
chmod -t src/{int,uint}{8,16,32,64}_*
chmod +t src/uint8_7bit_nonzero_mask_int16
env THREADS=64 time ./analyze
There were 639 implementations in total, each of which was compiled with
12 compilers, so 7668 implementation-compiler combinations total.
Timings:
21884.21user 6898.45system 10:05.60elapsed 4752%CPU (0avgtext+0avgdata 1117584maxresident)k
302048inputs+3216840outputs (2182847major+911883360minor)pagefaults 0swaps
The analysis cost was thus 3.8 core-seconds on average for each
implementation-compiler combination.
Overall memory consumption for 64 threads varied but was never observed
to pass 10GB. Most processes had RSS under 200MB. The occasional process
with RSS around 1GB had total RAM usage around 3.5GB.
Disk usage for the unprivileged user carrying out measurements was 2GB
at this point, not counting the space for installed system packages.
Installing elfulator (see README-elfulator) increased the user's disk
usage to 16GB (mostly from 6.5GB for build-sparc and 7.1GB for
build-sparc64).
Unrolling 5 implementations of int32_negative_mask for sparc32 under
elfulator---
cp compilers compilers.bak
grep elfulator compilers.bak > compilers
chmod +t src/*
chmod -t src/int32_negative_mask
time ./analyze
grep unroll build/*/*/diet*/analysis/seconds
---took on average 53 core-seconds per implementation (with very little
variance: between 52 and 54 core-seconds for each implementation). RSS
for each analysis was consistently under 350MB. For comparison,
unrolling for amd64 without elfulator took 0.25 seconds for the fastest
implementation and 0.40 seconds for the slowest. Unrolling has higher
cost with elfulator because of the double emulation layers.
Rerunning after adding elfulator lines for sparc64, arm64, arm32---
( echo 'c sparc64 elfulator: diet-sparc64 sparc64-linux-gcc -Os'
echo 'c arm32 elfulator: diet-arm32 arm-linux-gnueabi-gcc -Os'
echo 'c arm64 elfulator: diet-arm64 aarch64-linux-gnu-gcc -Os'
) >> compilers
time ./analyze
grep unroll build/*/*/diet*/analysis/seconds
---showed unrolling times between 68 seconds and 70 seconds for sparc64;
between 110 seconds and 112 seconds for arm64; and between 347 seconds
and 351 seconds for arm32.
Returning to the original compilers list (in total 13 C compilers,
including elfulator for sparc32) and analyzing all 248 cryptoint
functions (so 8307 compiler-implementation combinations total) with 64
threads---
cp compilers.bak compilers
chmod +t src/*
chmod -t src/{int,uint}{8,16,32,64}_*
chmod +t src/uint8_7bit_nonzero_mask_int16
env THREADS=64 time ./analyze
---had the following timings:
57021.68user 7237.07system 22:44.23elapsed 4710%CPU (0avgtext+0avgdata 2749464maxresident)k
0inputs+3480056outputs (2151278major+997382828minor)pagefaults 0swaps
The 639 elfulator analyses thus added 35476 core-seconds, i.e., 56
core-seconds on average per implementation-compiler combination.
Overall memory consumption for 64 threads was never observed to pass
16GB. The maximum observed RSS for a single process was 2.7GB. The
maximum observed VSZ for a single process was 56GB. For analyses of
the cryptoint implementations, the maximum observed RSS for a single
process was 387MB, and the maximum observed VSZ was 20GB.
Out of all 8307 compiler-implementation combinations, there were 8190
successfully unrolled combinations (including all 3224 compilations of
the cryptoint implementations), including 572 of the implementations
compiled for sparc32 (including all 248 of the cryptoint
implementations compiled for sparc32). There were 8155 results marked
equals-* (including all 3224 compilations of the cryptoint
implementations), including 569 of the implementations compiled for
sparc32 (including all 248 of the cryptoint implementations).
Experiments replacing python3 with pypy3 reduced user time by 2x while
increasing RAM usage by about 1.5x. Unfortunately, pypy3 occasionally
hangs in __futex_abstimed_wait_common64; currently saferewrite doesn't
know how to recognize the hang and restart the process.
Some src/* functions are more complicated than the cryptoint functions.
Analyses of single implementations have been observed using 100GB RSS.