Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uint256: optimize Mul, squared and Exp #152

Merged
merged 1 commit into from
Mar 25, 2024
Merged

Conversation

AaronChen0
Copy link
Contributor

@AaronChen0 AaronChen0 commented Mar 22, 2024

Using 4 local uint64 variable enables register allocation instead of memory allocation for uint256.
This improve the performance of Mul, squared, Exp about 40%.


Edit:
I found out later that the above is not the reason why this pull request can boost performance.
Optimzing uint256.Set achieves the same effect. See #153


Running

go test ./...

returns

ok  	github.com/holiman/uint256	0.978s

Benchmark

go test -run - -bench BenchmarkExp -benchmem -count=10 >/tmp/old

squared benchmark:

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                         │     old     │                 new                 │
                         │   sec/op    │   sec/op     vs base                │
Square/single/uint256-16   7.235n ± 3%   4.304n ± 1%  -40.52% (p=0.000 n=10)
Square/single/big-16       36.73n ± 1%   36.72n ± 1%        ~ (p=0.782 n=10)
geomean                    16.30n        12.57n       -22.89%

                         │     old      │                 new                 │
                         │     B/op     │    B/op     vs base                 │
Square/single/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Square/single/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                         │     old      │                 new                 │
                         │  allocs/op   │ allocs/op   vs base                 │
Square/single/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Square/single/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                               ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Exp benchmark:

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                     │     old     │                 new                 │
                     │   sec/op    │   sec/op     vs base                │
Exp/large/big-16       15.70µ ± 4%   16.22µ ± 2%   +3.30% (p=0.011 n=10)
Exp/large/uint256-16   3.986µ ± 1%   1.592µ ± 3%  -60.06% (p=0.000 n=10)
Exp/small/big-16       5.275µ ± 3%   5.277µ ± 2%        ~ (p=0.684 n=10)
Exp/small/uint256-16   338.6n ± 0%   137.6n ± 2%  -59.37% (p=0.000 n=10)
geomean                3.252µ        2.081µ       -36.01%

                     │      old       │                  new                  │
                     │      B/op      │     B/op      vs base                 │
Exp/large/big-16       17.72Ki ± 0%     17.72Ki ± 0%       ~ (p=1.000 n=10) ¹
Exp/large/uint256-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/big-16       7.219Ki ± 0%     7.219Ki ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/uint256-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                             ²                 +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                     │     old      │                 new                 │
                     │  allocs/op   │ allocs/op   vs base                 │
Exp/large/big-16       189.0 ± 0%     189.0 ± 0%       ~ (p=1.000 n=10) ¹
Exp/large/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/big-16       77.00 ± 0%     77.00 ± 0%       ~ (p=1.000 n=10) ¹
Exp/small/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                           ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Mul benchmark:

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                                 │     old     │                 new                 │
                                 │   sec/op    │   sec/op     vs base                │
Mul/single/uint256-16              6.892n ± 3%   4.358n ± 1%  -36.77% (p=0.000 n=10)
Mul/single/big-16                  37.78n ± 1%   37.78n ± 1%        ~ (p=0.868 n=10)
MulOverflow/single/uint256-16      15.01n ± 2%   15.01n ± 2%        ~ (p=0.853 n=10)
MulOverflow/single/big-16          37.74n ± 1%   37.73n ± 2%        ~ (p=0.956 n=10)
MulMod/small/uint256-16            21.39n ± 1%   21.38n ± 1%        ~ (p=0.956 n=10)
MulMod/mod64/uint256-16            38.74n ± 1%   39.01n ± 1%   +0.68% (p=0.027 n=10)
MulMod/mod128/uint256-16           64.02n ± 5%   63.04n ± 1%   -1.53% (p=0.022 n=10)
MulMod/mod192/uint256-16           79.52n ± 0%   79.16n ± 1%        ~ (p=0.128 n=10)
MulMod/mod256/uint256-16           94.50n ± 1%   94.59n ± 1%        ~ (p=0.739 n=10)
MulMod/mod256/uint256r-16          44.35n ± 1%   43.95n ± 1%        ~ (p=0.109 n=10)
MulMod/small/big-16                37.88n ± 3%   38.48n ± 3%        ~ (p=0.052 n=10)
MulMod/mod64/big-16                62.54n ± 1%   61.74n ± 4%        ~ (p=0.436 n=10)
MulMod/mod128/big-16               210.6n ± 3%   209.9n ± 3%        ~ (p=0.684 n=10)
MulMod/mod192/big-16               243.6n ± 3%   240.1n ± 3%        ~ (p=0.481 n=10)
MulMod/mod256/big-16               276.8n ± 2%   277.4n ± 3%        ~ (p=0.631 n=10)
MulDivOverflow/small/uint256-16    2.099n ± 3%   1.972n ± 1%   -6.00% (p=0.000 n=10)
MulDivOverflow/div64/uint256-16    2.085n ± 1%   1.964n ± 0%   -5.80% (p=0.000 n=10)
MulDivOverflow/div128/uint256-16   2.143n ± 2%   2.030n ± 1%   -5.25% (p=0.000 n=10)
MulDivOverflow/div192/uint256-16   2.160n ± 1%   2.037n ± 3%   -5.69% (p=0.000 n=10)
MulDivOverflow/div256/uint256-16   2.220n ± 2%   2.095n ± 2%   -5.65% (p=0.000 n=10)
MulDivOverflow/small/big-16        14.10n ± 2%   14.17n ± 1%        ~ (p=0.148 n=10)
MulDivOverflow/div64/big-16        14.06n ± 2%   14.25n ± 4%   +1.35% (p=0.016 n=10)
MulDivOverflow/div128/big-16       16.79n ± 5%   16.96n ± 4%        ~ (p=0.353 n=10)
MulDivOverflow/div192/big-16       17.10n ± 2%   17.19n ± 3%        ~ (p=0.927 n=10)
MulDivOverflow/div256/big-16       18.89n ± 5%   18.68n ± 7%        ~ (p=0.280 n=10)
geomean                            22.14n        21.47n        -3.01%

                                 │     old      │                 new                  │
                                 │     B/op     │    B/op      vs base                 │
Mul/single/uint256-16              0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
Mul/single/big-16                  0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/uint256-16      0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/big-16          0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/small/uint256-16            0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod64/uint256-16            0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/uint256-16           0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/uint256-16           0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256-16           0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256r-16          0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/small/big-16                7.500 ± 7%     7.000 ± 14%       ~ (p=0.650 n=10)
MulMod/mod64/big-16                48.00 ± 0%     48.00 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/big-16               128.0 ± 0%     128.0 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/big-16               144.0 ± 0%     144.0 ±  0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/big-16               176.0 ± 0%     176.0 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/uint256-16    0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/uint256-16    0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/uint256-16   0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/uint256-16   0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/uint256-16   0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/big-16        0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/big-16        0.000 ± 0%     0.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/big-16       1.000 ± 0%     1.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/big-16       1.000 ± 0%     1.000 ±  0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/big-16       3.000 ± 0%     3.000 ± 33%       ~ (p=0.087 n=10)
geomean                                       ²                -0.28%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                 │     old      │                 new                 │
                                 │  allocs/op   │ allocs/op   vs base                 │
Mul/single/uint256-16              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Mul/single/big-16                  0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/uint256-16      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulOverflow/single/big-16          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/small/uint256-16            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod64/uint256-16            0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/uint256-16           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/uint256-16           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256-16           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/uint256r-16          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/small/big-16                0.000 ±  ?     0.000 ±  ?       ~ (p=1.000 n=10)
MulMod/mod64/big-16                1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod128/big-16               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod192/big-16               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
MulMod/mod256/big-16               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/uint256-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/uint256-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/small/big-16        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div64/big-16        0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div128/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div192/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
MulDivOverflow/div256/big-16       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                       ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Copy link

codecov bot commented Mar 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (97405b6) to head (d6c1d7d).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #152   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1643      1642    -1     
=========================================
- Hits          1643      1642    -1     

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

@holiman holiman changed the title uint256: optimize Mul, squared uint256: optimize Mul, squared and Exp Mar 25, 2024
@holiman holiman merged commit c9fc0ce into holiman:master Mar 25, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants