-rw-r--r-- 4923 saferewrite-20260206/README-resources raw
Recommendations: (1) use a modern server with a reasonable amount of RAM, (2) limit the number of threads that saferewrite will use (see the THREADS mechanism in README), and (3) select the analyses that you're interested in (see the chmod mechanism in README). The following measurements were collected on rome2, a dual EPYC 7742 (128 cores overall), overclocking disabled (so CPUs were running at 2.245GHz), 512GB RAM, 1TB swap space, running Debian 12. One run (before elfulator was installed) analyzed all 248 cryptoint functions (not just the 248 cryptoint implementations of those functions but also 391 other implementations): chmod +t src/* chmod -t src/{int,uint}{8,16,32,64}_* chmod +t src/uint8_7bit_nonzero_mask_int16 env THREADS=64 time ./analyze There were 639 implementations in total, each of which was compiled with 12 compilers, so 7668 implementation-compiler combinations total. Timings: 21884.21user 6898.45system 10:05.60elapsed 4752%CPU (0avgtext+0avgdata 1117584maxresident)k 302048inputs+3216840outputs (2182847major+911883360minor)pagefaults 0swaps The analysis cost was thus 3.8 core-seconds on average for each implementation-compiler combination. Overall memory consumption for 64 threads varied but was never observed to pass 10GB. Most processes had RSS under 200MB. The occasional process with RSS around 1GB had total RAM usage around 3.5GB. Disk usage for the unprivileged user carrying out measurements was 2GB at this point, not counting the space for installed system packages. Installing elfulator (see README-elfulator) increased the user's disk usage to 16GB (mostly from 6.5GB for build-sparc and 7.1GB for build-sparc64). Unrolling 5 implementations of int32_negative_mask for sparc32 under elfulator--- cp compilers compilers.bak grep elfulator compilers.bak > compilers chmod +t src/* chmod -t src/int32_negative_mask time ./analyze grep unroll build/*/*/diet*/analysis/seconds ---took on average 53 core-seconds per implementation (with very little variance: between 52 and 54 core-seconds for each implementation). RSS for each analysis was consistently under 350MB. For comparison, unrolling for amd64 without elfulator took 0.25 seconds for the fastest implementation and 0.40 seconds for the slowest. Unrolling has higher cost with elfulator because of the double emulation layers. Rerunning after adding elfulator lines for sparc64, arm64, arm32--- ( echo 'c sparc64 elfulator: diet-sparc64 sparc64-linux-gcc -Os' echo 'c arm32 elfulator: diet-arm32 arm-linux-gnueabi-gcc -Os' echo 'c arm64 elfulator: diet-arm64 aarch64-linux-gnu-gcc -Os' ) >> compilers time ./analyze grep unroll build/*/*/diet*/analysis/seconds ---showed unrolling times between 68 seconds and 70 seconds for sparc64; between 110 seconds and 112 seconds for arm64; and between 347 seconds and 351 seconds for arm32. Returning to the original compilers list (in total 13 C compilers, including elfulator for sparc32) and analyzing all 248 cryptoint functions (so 8307 compiler-implementation combinations total) with 64 threads--- cp compilers.bak compilers chmod +t src/* chmod -t src/{int,uint}{8,16,32,64}_* chmod +t src/uint8_7bit_nonzero_mask_int16 env THREADS=64 time ./analyze ---had the following timings: 57021.68user 7237.07system 22:44.23elapsed 4710%CPU (0avgtext+0avgdata 2749464maxresident)k 0inputs+3480056outputs (2151278major+997382828minor)pagefaults 0swaps The 639 elfulator analyses thus added 35476 core-seconds, i.e., 56 core-seconds on average per implementation-compiler combination. Overall memory consumption for 64 threads was never observed to pass 16GB. The maximum observed RSS for a single process was 2.7GB. The maximum observed VSZ for a single process was 56GB. For analyses of the cryptoint implementations, the maximum observed RSS for a single process was 387MB, and the maximum observed VSZ was 20GB. Out of all 8307 compiler-implementation combinations, there were 8190 successfully unrolled combinations (including all 3224 compilations of the cryptoint implementations), including 572 of the implementations compiled for sparc32 (including all 248 of the cryptoint implementations compiled for sparc32). There were 8155 results marked equals-* (including all 3224 compilations of the cryptoint implementations), including 569 of the implementations compiled for sparc32 (including all 248 of the cryptoint implementations). Experiments replacing python3 with pypy3 reduced user time by 2x while increasing RAM usage by about 1.5x. Unfortunately, pypy3 occasionally hangs in __futex_abstimed_wait_common64; currently saferewrite doesn't know how to recognize the hang and restart the process. Some src/* functions are more complicated than the cryptoint functions. Analyses of single implementations have been observed using 100GB RSS.