Discussion:
[PHC] Fwd: memory-hard password hashing
Erik Aronesty
2017-04-19 16:07:12 UTC
Permalink
There is a focus on memory-hard algorithms in password hashing, since that
renders them ASIC/GPU resistant.

However, it is /possible/ for an ASIC to be tightly coupled with DRAM in
ways that can exceed PC memory performance.

Instead, is there an algorithm that is "very GPU friendly"?

GPU's are cheap, common and highly optimized for certain operations in ways
that make the construction of ASICs that beat them at what they are best at
very unlikely. They are, essentially, "floating point matrix operation
ASICs".

Yes, this means that servers using such a mechanism may want to have GPUs
installed to reduce hashing times and allow a larger number of operations
to increase hashing security. But I think this is a reasonable request
for many applications.

If it takes 10 milliseconds to compute a single hash with highly parallel
FLOPs on a modern GPU, that affords a massive amount of security for a hash
- ASIC resistant in ways that memory-hard algorithms cannot be.

Is there an algorithm that already lends itself to this kind of application?
Dean Pierce
2017-04-19 16:38:26 UTC
Permalink
Personally I think "GPU friendly" is not a great thing for passwords, since
GPUs are asymmetrically more popular with attackers, but you might be
interested in checking out the "Vertcoin" project: https://vertcoin.org/

It's a cryptocoin, similar to Bitcoin, with the gimmick that they have
always tried to target the GPU mining audience. Their proof of work
started out with an adaptive N scrypt algorithm, but when KnC released
scrypt ASICs, they pivoted to a modified Lyra2, at which point a large
botnet took control of most of the mining power, and they pivoted again
with a Lyra2 more finely tuned for modern GPUs.

I think the ideal goal for password algorithms is CPU friendly, where the
most efficient machinery for calculating the hash is a commodity CPU. At
that point, you could roll your own ASICs, but they wouldn't be much better
than what you could just pick up at Best Buy, and you'd lose out on the
economies of scale that CPU vendors have.

- DEAN
Post by Erik Aronesty
There is a focus on memory-hard algorithms in password hashing, since that
renders them ASIC/GPU resistant.
However, it is /possible/ for an ASIC to be tightly coupled with DRAM in
ways that can exceed PC memory performance.
Instead, is there an algorithm that is "very GPU friendly"?
GPU's are cheap, common and highly optimized for certain operations in
ways that make the construction of ASICs that beat them at what they are
best at very unlikely. They are, essentially, "floating point matrix
operation ASICs".
Yes, this means that servers using such a mechanism may want to have GPUs
installed to reduce hashing times and allow a larger number of operations
to increase hashing security. But I think this is a reasonable request
for many applications.
If it takes 10 milliseconds to compute a single hash with highly parallel
FLOPs on a modern GPU, that affords a massive amount of security for a hash
- ASIC resistant in ways that memory-hard algorithms cannot be.
Is there an algorithm that already lends itself to this kind of application?
Bill Cox
2017-04-19 17:04:03 UTC
Permalink
I wrote a GPU-focused memory-hard proof-of-work algorithm with fast
verification I call GPU Hog. Basically, it hashes the input key material
to init a CPRNG that selects several 4 KiB blocks from a multi-GiB pool of
pseudo-random data. It hashes the selected blocks together, and if the
result has enough leading zeros, then you've proven you have done enough
work.

It requires a modern GPU with high GDRAM bandwidth since it is memory
bandwidth limited. For mining, you need all of the pseudo-random data in
memory to mine efficiently, but for verification, you only need to generate
the several pseudo-random blocks and hash them together. Verification is
faster than a typical public key operation, so it would not be the
bottleneck in verifying a block-chain.

Is this similar to what you had in mind, or are you specifically looking
for GPU based password hashing? I think that password hashing on mobile
phones and tablets could be considerably strengthened using their graphics
processors.

Bill
Post by Dean Pierce
Personally I think "GPU friendly" is not a great thing for passwords,
since GPUs are asymmetrically more popular with attackers, but you might be
interested in checking out the "Vertcoin" project: https://vertcoin.org/
It's a cryptocoin, similar to Bitcoin, with the gimmick that they have
always tried to target the GPU mining audience. Their proof of work
started out with an adaptive N scrypt algorithm, but when KnC released
scrypt ASICs, they pivoted to a modified Lyra2, at which point a large
botnet took control of most of the mining power, and they pivoted again
with a Lyra2 more finely tuned for modern GPUs.
I think the ideal goal for password algorithms is CPU friendly, where the
most efficient machinery for calculating the hash is a commodity CPU. At
that point, you could roll your own ASICs, but they wouldn't be much better
than what you could just pick up at Best Buy, and you'd lose out on the
economies of scale that CPU vendors have.
- DEAN
Post by Erik Aronesty
There is a focus on memory-hard algorithms in password hashing, since
that renders them ASIC/GPU resistant.
However, it is /possible/ for an ASIC to be tightly coupled with DRAM in
ways that can exceed PC memory performance.
Instead, is there an algorithm that is "very GPU friendly"?
GPU's are cheap, common and highly optimized for certain operations in
ways that make the construction of ASICs that beat them at what they are
best at very unlikely. They are, essentially, "floating point matrix
operation ASICs".
Yes, this means that servers using such a mechanism may want to have GPUs
installed to reduce hashing times and allow a larger number of operations
to increase hashing security. But I think this is a reasonable request
for many applications.
If it takes 10 milliseconds to compute a single hash with highly parallel
FLOPs on a modern GPU, that affords a massive amount of security for a hash
- ASIC resistant in ways that memory-hard algorithms cannot be.
Is there an algorithm that already lends itself to this kind of application?
Erik Aronesty
2017-04-19 19:53:53 UTC
Permalink
I don't get how "GPU hogs" verification works.

There are several fast double-precision CPRNG's designed for high speed GPU
operation that would be hard to beat with any ASIC.

Not sure how you managed to turn an CPRNG into a secure hash.
Post by Bill Cox
I wrote a GPU-focused memory-hard proof-of-work algorithm with fast
verification I call GPU Hog. Basically, it hashes the input key material
to init a CPRNG that selects several 4 KiB blocks from a multi-GiB pool of
pseudo-random data. It hashes the selected blocks together, and if the
result has enough leading zeros, then you've proven you have done enough
work.
It requires a modern GPU with high GDRAM bandwidth since it is memory
bandwidth limited. For mining, you need all of the pseudo-random data in
memory to mine efficiently, but for verification, you only need to generate
the several pseudo-random blocks and hash them together. Verification is
faster than a typical public key operation, so it would not be the
bottleneck in verifying a block-chain.
Is this similar to what you had in mind, or are you specifically looking
for GPU based password hashing? I think that password hashing on mobile
phones and tablets could be considerably strengthened using their graphics
processors.
Bill
Post by Dean Pierce
Personally I think "GPU friendly" is not a great thing for passwords,
since GPUs are asymmetrically more popular with attackers, but you might be
interested in checking out the "Vertcoin" project: https://vertcoin.org/
It's a cryptocoin, similar to Bitcoin, with the gimmick that they have
always tried to target the GPU mining audience. Their proof of work
started out with an adaptive N scrypt algorithm, but when KnC released
scrypt ASICs, they pivoted to a modified Lyra2, at which point a large
botnet took control of most of the mining power, and they pivoted again
with a Lyra2 more finely tuned for modern GPUs.
I think the ideal goal for password algorithms is CPU friendly, where the
most efficient machinery for calculating the hash is a commodity CPU. At
that point, you could roll your own ASICs, but they wouldn't be much better
than what you could just pick up at Best Buy, and you'd lose out on the
economies of scale that CPU vendors have.
- DEAN
Post by Erik Aronesty
There is a focus on memory-hard algorithms in password hashing, since
that renders them ASIC/GPU resistant.
However, it is /possible/ for an ASIC to be tightly coupled with DRAM in
ways that can exceed PC memory performance.
Instead, is there an algorithm that is "very GPU friendly"?
GPU's are cheap, common and highly optimized for certain operations in
ways that make the construction of ASICs that beat them at what they are
best at very unlikely. They are, essentially, "floating point matrix
operation ASICs".
Yes, this means that servers using such a mechanism may want to have
GPUs installed to reduce hashing times and allow a larger number of
operations to increase hashing security. But I think this is a reasonable
request for many applications.
If it takes 10 milliseconds to compute a single hash with highly
parallel FLOPs on a modern GPU, that affords a massive amount of security
for a hash - ASIC resistant in ways that memory-hard algorithms cannot be.
Is there an algorithm that already lends itself to this kind of application?
Erik Aronesty
2017-04-19 19:56:02 UTC
Permalink
Just set the number of iterations high enough until I've achieved the 1ms I
want, or whatever is needed...?
Post by Erik Aronesty
I don't get how "GPU hogs" verification works.
There are several fast double-precision CPRNG's designed for high speed
GPU operation that would be hard to beat with any ASIC.
Not sure how you managed to turn an CPRNG into a secure hash.
Post by Bill Cox
I wrote a GPU-focused memory-hard proof-of-work algorithm with fast
verification I call GPU Hog. Basically, it hashes the input key material
to init a CPRNG that selects several 4 KiB blocks from a multi-GiB pool of
pseudo-random data. It hashes the selected blocks together, and if the
result has enough leading zeros, then you've proven you have done enough
work.
It requires a modern GPU with high GDRAM bandwidth since it is memory
bandwidth limited. For mining, you need all of the pseudo-random data in
memory to mine efficiently, but for verification, you only need to generate
the several pseudo-random blocks and hash them together. Verification is
faster than a typical public key operation, so it would not be the
bottleneck in verifying a block-chain.
Is this similar to what you had in mind, or are you specifically looking
for GPU based password hashing? I think that password hashing on mobile
phones and tablets could be considerably strengthened using their graphics
processors.
Bill
Post by Dean Pierce
Personally I think "GPU friendly" is not a great thing for passwords,
since GPUs are asymmetrically more popular with attackers, but you might be
interested in checking out the "Vertcoin" project: https://vertcoin.org/
It's a cryptocoin, similar to Bitcoin, with the gimmick that they have
always tried to target the GPU mining audience. Their proof of work
started out with an adaptive N scrypt algorithm, but when KnC released
scrypt ASICs, they pivoted to a modified Lyra2, at which point a large
botnet took control of most of the mining power, and they pivoted again
with a Lyra2 more finely tuned for modern GPUs.
I think the ideal goal for password algorithms is CPU friendly, where
the most efficient machinery for calculating the hash is a commodity CPU.
At that point, you could roll your own ASICs, but they wouldn't be much
better than what you could just pick up at Best Buy, and you'd lose out on
the economies of scale that CPU vendors have.
- DEAN
Post by Erik Aronesty
There is a focus on memory-hard algorithms in password hashing, since
that renders them ASIC/GPU resistant.
However, it is /possible/ for an ASIC to be tightly coupled with DRAM
in ways that can exceed PC memory performance.
Instead, is there an algorithm that is "very GPU friendly"?
GPU's are cheap, common and highly optimized for certain operations in
ways that make the construction of ASICs that beat them at what they are
best at very unlikely. They are, essentially, "floating point matrix
operation ASICs".
Yes, this means that servers using such a mechanism may want to have
GPUs installed to reduce hashing times and allow a larger number of
operations to increase hashing security. But I think this is a reasonable
request for many applications.
If it takes 10 milliseconds to compute a single hash with highly
parallel FLOPs on a modern GPU, that affords a massive amount of security
for a hash - ASIC resistant in ways that memory-hard algorithms cannot be.
Is there an algorithm that already lends itself to this kind of application?
Solar Designer
2017-04-19 20:04:08 UTC
Permalink
Post by Erik Aronesty
There are several fast double-precision CPRNG's designed for high speed GPU
operation that would be hard to beat with any ASIC.
What are they?

(Not arguing with you here, just want to learn.)

Alexander
Erik Aronesty
2017-04-19 20:44:49 UTC
Permalink
My personal favorite, because I like the lorenz attractor, and it's been
shown to be secure. Not sure how optimal it is for GPU, but it's all
floating-point arithmetic. Possible to do it faster with an ASIC? Maybe.
https://arxiv.org/pdf/1702.07502.pdf

Fast PRNG, not sure how secure it is:
https://pdfs.semanticscholar.org/cfdc/468d3021fe0020a138cc4e335a6b9afa6b29.pdf

This library analyzes and optimizes a number of well-known PRNG's for GPUs,
haven't read it too closely.
https://arxiv.org/pdf/1307.5869.pdf

Fourier transforms, should be fairly secure, needs more analysis:
https://arxiv.org/pdf/1411.2484.pdf
Post by Solar Designer
Post by Erik Aronesty
There are several fast double-precision CPRNG's designed for high speed
GPU
Post by Erik Aronesty
operation that would be hard to beat with any ASIC.
What are they?
(Not arguing with you here, just want to learn.)
Alexander
Erik Aronesty
2017-04-19 19:36:32 UTC
Permalink
It is precisely vertcoin's experience with botnets that would lead me to
want a GPU-friendly solution instead. Represents a middle-ground.
"Lyra2RE" seems to be GPU-friendly, but also not sufficiently ASIC
resistant.

Maybe something that uses parallel FLOPs might be ideal for maximizing GPU
utility. Sequential memory-hard operations seem to target CPUs in a way
that would not be ideal in situations where botnet attacks are a concern.
Post by Dean Pierce
Personally I think "GPU friendly" is not a great thing for passwords,
since GPUs are asymmetrically more popular with attackers, but you might be
interested in checking out the "Vertcoin" project: https://vertcoin.org/
It's a cryptocoin, similar to Bitcoin, with the gimmick that they have
always tried to target the GPU mining audience. Their proof of work
started out with an adaptive N scrypt algorithm, but when KnC released
scrypt ASICs, they pivoted to a modified Lyra2, at which point a large
botnet took control of most of the mining power, and they pivoted again
with a Lyra2 more finely tuned for modern GPUs.
I think the ideal goal for password algorithms is CPU friendly, where the
most efficient machinery for calculating the hash is a commodity CPU. At
that point, you could roll your own ASICs, but they wouldn't be much better
than what you could just pick up at Best Buy, and you'd lose out on the
economies of scale that CPU vendors have.
- DEAN
Post by Erik Aronesty
There is a focus on memory-hard algorithms in password hashing, since
that renders them ASIC/GPU resistant.
However, it is /possible/ for an ASIC to be tightly coupled with DRAM in
ways that can exceed PC memory performance.
Instead, is there an algorithm that is "very GPU friendly"?
GPU's are cheap, common and highly optimized for certain operations in
ways that make the construction of ASICs that beat them at what they are
best at very unlikely. They are, essentially, "floating point matrix
operation ASICs".
Yes, this means that servers using such a mechanism may want to have GPUs
installed to reduce hashing times and allow a larger number of operations
to increase hashing security. But I think this is a reasonable request
for many applications.
If it takes 10 milliseconds to compute a single hash with highly parallel
FLOPs on a modern GPU, that affords a massive amount of security for a hash
- ASIC resistant in ways that memory-hard algorithms cannot be.
Is there an algorithm that already lends itself to this kind of application?
Solar Designer
2017-04-19 20:01:34 UTC
Permalink
Post by Erik Aronesty
It is precisely vertcoin's experience with botnets that would lead me to
want a GPU-friendly solution instead. Represents a middle-ground.
yescrypt's ROM was also motivated in part by the threat of botnets. You
don't need to go for GPUs for botnet resistance - you can use and depend
on a lot of RAM instead (more than you'd expect in most botnet nodes in
a few years' time). Putting more RAM into servers doesn't have as many
drawbacks as putting GPUs in there.

Alexander
Solar Designer
2017-04-19 18:27:55 UTC
Permalink
Post by Erik Aronesty
There is a focus on memory-hard algorithms in password hashing, since that
renders them ASIC/GPU resistant.
It's not that simple. (Maybe you over-simplified on purpose.) There's
a need to distinguish sequential memory-hard (as formally introduced
with scrypt) and parallelizable/amortizable memory-hard. And there are
specific parameters to be "ASIC/GPU resistant" to varying extent. BTW,
the scrypt paper doesn't even mention GPUs.
Post by Erik Aronesty
However, it is /possible/ for an ASIC to be tightly coupled with DRAM in
ways that can exceed PC memory performance.
Yes.

You might misunderstand "memory-hard" as meaning "memory-bound", but
those are distinct properties. (Or maybe you're over-simplifying.)

Memory-hardness is passing onto the attacker the cost of memory itself
(die area occupied by memory), not the cost of providing a certain
memory bandwidth. So you need to distinguish: sequential memory-hard,
parallelizable/amortizable memory-hard, and memory-bound. There may
also be combinations of these.

To address the potential for reduction in the "time" factor in
memory*time in an ASIC with higher memory bandwidth, sequential
memory-hard schemes do quite some computation. The extra bandwidth
isn't really beneficial to an attacker unless there's sufficient memory
(aka sufficient die area occupied by memory) to accommodate a larger
number of instances, and/or there's a corresponding increase in
computation speed within each one instance. This is why some PHC
schemes introduced compute latency hardening through integer
multiplication and/or small S-box lookups (taking on the order of ~1ns
on a CPU and presumably similar in an ASIC).
Post by Erik Aronesty
Instead, is there an algorithm that is "very GPU friendly"?
Yes. As it happens, scrypt at low N and r=1, much like it's "misused"
in Litecoin, is very GPU friendly. Litecoin uses p=1, but if you want
to occupy the entire GPU by one hash computation, you can set p to a
high value (thousands). Of course, ASICs do better yet.

For asymmetric PoW, as it happens Equihash (at least as used in Zcash)
is very GPU friendly. ASICs are likely to do better yet (although it'd
take more R&D effort than Litecoin scrypt ASICs did):

http://www.openwall.com/articles/Zcash-Equihash-Analysis

These two are very different: scrypt is sequential-memory hard, which we
mostly undo with low N and high p, while Equihash is parallelizable
memory-hard and likely memory-bound as it is.
Post by Erik Aronesty
GPU's are cheap, common and highly optimized for certain operations in ways
that make the construction of ASICs that beat them at what they are best at
very unlikely. They are, essentially, "floating point matrix operation
ASICs".
Sort of, but "very unlikely" is an exaggeration. GPUs are not that
specialized, and cost efficiency improvement is generally possible (the
questions are: at what budget and by how much).

That said, I do feel that running on a GPU and not making use of its
computing power (which scrypt and Equihash use little of) is a waste.
So if I were to target a GPU in a sequential-memory hard scheme, I'd
probably include some compute latency hardening through multiplication
e.g. by using yescrypt with unusually tiny pwxform S-boxes (like 256
bytes per instance) or effectively without them at all (a modification,
or size so tiny it's not requiring a local memory lookup but merely a
conditional move from one of a couple of registers).
Post by Erik Aronesty
Yes, this means that servers using such a mechanism may want to have GPUs
installed to reduce hashing times and allow a larger number of operations
to increase hashing security. But I think this is a reasonable request
for many applications.
Yes, but when you consider the overhead of less standard hardware
acquisition and systems administration, and the extra bugs and security
vulnerabilities of GPUs/drivers (forcibly giving up on local security in
those systems), and compare that against deploying HSM-alikes (similar
overhead), the latter may end up winning. They would not provide the
heavy hashing, but instead they'd hold secrets (e.g. a hash encryption
key). You would still be able to do some heavy processing on the host
(in case an HSM secret leaks).
Post by Erik Aronesty
If it takes 10 milliseconds to compute a single hash with highly parallel
FLOPs on a modern GPU, that affords a massive amount of security for a hash
Yes.
Post by Erik Aronesty
- ASIC resistant in ways that memory-hard algorithms cannot be.
That's questionable.

Then there's also that "ROM-port-hardness" aspect I introduced in 2012.
With it in the mix, even ASICs would need to provide the, say, 112 GiB
of memory - and efficiently sharing it across a large number of ASICs
(to amortize its cost to an attacker) isn't that easy nor cheap.
Post by Erik Aronesty
Is there an algorithm that already lends itself to this kind of application?
Like I said, yes. But I currently don't recommend that.

BTW, 10ms is a lot for a deployment that is large-scale enough it would
even consider customizing the hardware. It's not typical. But if you
can afford that, it means you can do ~128 MiB (ye)scrypt on CPUs in a
typical dual-socket server. That's unusually high as it is. And you
can add to that e.g. a 112 GiB ROM, still staying only with standard
server hardware (CPUs+RAM). Then why also bother with GPUs?

Alexander

P.S. Why the "Fwd" in the Subject? Are you adding to some old thread?
Loading...