Discussion:
[PHC] An API for Efficiently Obtaining t_cost
Taylor Hornby
2015-10-11 20:52:08 UTC
Permalink
Hi,

I'm not sure if it's the PHC's job to standardize on an API for the
winning hash function, but I'm going to strongly argue that
implementations MUST to provide an API similar to the following:

int PHS_find_t_cost(
void *out,                 /* same */
size_t outlen,             /* same */
const void *in,            /* same */
size_t inlen,              /* same */
     const void *salt,          /* same */
size_t saltlen,            /* same */
unsigned int target_msec,  /* target milliseconds */
        unsigned int *t_cost,      /* output parameter */
unsigned int m_cost        /* same */
);

Calling this API should *efficiently* find the best t_cost for the
given m_cost and target running time, on the current system under its
current workload, while at the same time computing a hash. In other
words, for a fixed m_cost, it should find the smallest valid t_cost
such that the time taken is longer than target_msec milliseconds.

The problem is that on memory-constrained devices, you might want to
use as much memory you can, and then keep computing on that memory for
a certain amount of time, say 5 seconds, when you don't know which
t_cost to use for that amount of time, and you don't want to do trial-
and-error or extrapolation from faster calls.

I think this is an extremely important API to mandate, since if the
developer needs it, and it isn't provided for them, they either have to
do trial-and-error or heuristically guess-then-check t_cost which is
not efficient, OR they have to modify or even re-implement the
password-hashing library, likely losing important optimizations and
thus giving the attacker an advantage in the process.

It will also result higher t_cost being chosen in practice. Developers
can hard-code the time cost in units of seconds, so that t_cost is
always as high as it can be without impacting usability. Without this
API, they'd have to run benchmarks on the system to select t_cost, or
hard-code non-optimal costs, like the libsodium API [1,2,3] does.

It's possible to efficiently provide this API for the iteration count
in PBKDF2 as well as for 'p' in scrypt. I think it is also possible for
Argon2.

Even if we decide not to require this API, please consider adding it to
your implementations.

Thanks,
-Taylor

[1] http://doc.libsodium.org/password_hashing/index.html
[2] https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c
46d4a1d37adc1e07/src/libsodium/include/sodium/crypto_pwhash_scryptsalsa
208sha256.h#L28
[3] https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c
46d4a1d37adc1e07/src/libsodium/crypto_pwhash/scryptsalsa208sha256/pwhas
h_scryptsalsa208sha256.c#L18

p.s. Apologies if this has been discussed before. There's a lot on this
list I haven't read.
Bill Cox
2015-10-12 00:08:15 UTC
Permalink
Some entries (I think including Yescrypt and TwoCats) do something similar,
and provide estimated parameters to meet your time goal. These tools take
a max_memory and target_time parameters. The more typical case is that we
cannot fill as much memory as you would like for a given target time. so
t_cost is set to the minimum value, and an m_cost parameter is suggested to
target the desired runtime. In rare cases, you want to fill memory and
keep rehashing, in which case m_cost is set to max_mem and t_cost is
adjusted accordingly.

I have yet to see an actual deployment making use of more than the minimum
t_cost parameter in any application, so I'm not a fan of that parameter,
but it's there, and both m_cost and t_cost should be suggested for Argon2.
I suspect their team already has something like this in the works. They
are welcome to look at my code in twocats-common.c to see what I did.

Bill
Post by Taylor Hornby
Hi,
I'm not sure if it's the PHC's job to standardize on an API for the
winning hash function, but I'm going to strongly argue that
int PHS_find_t_cost(
void *out, /* same */
size_t outlen, /* same */
const void *in, /* same */
size_t inlen, /* same */
const void *salt, /* same */
size_t saltlen, /* same */
unsigned int target_msec, /* target milliseconds */
unsigned int *t_cost, /* output parameter */
unsigned int m_cost /* same */
);
Calling this API should *efficiently* find the best t_cost for the
given m_cost and target running time, on the current system under its
current workload, while at the same time computing a hash. In other
words, for a fixed m_cost, it should find the smallest valid t_cost
such that the time taken is longer than target_msec milliseconds.
The problem is that on memory-constrained devices, you might want to
use as much memory you can, and then keep computing on that memory for
a certain amount of time, say 5 seconds, when you don't know which
t_cost to use for that amount of time, and you don't want to do trial-
and-error or extrapolation from faster calls.
I think this is an extremely important API to mandate, since if the
developer needs it, and it isn't provided for them, they either have to
do trial-and-error or heuristically guess-then-check t_cost which is
not efficient, OR they have to modify or even re-implement the
password-hashing library, likely losing important optimizations and
thus giving the attacker an advantage in the process.
It will also result higher t_cost being chosen in practice. Developers
can hard-code the time cost in units of seconds, so that t_cost is
always as high as it can be without impacting usability. Without this
API, they'd have to run benchmarks on the system to select t_cost, or
hard-code non-optimal costs, like the libsodium API [1,2,3] does.
It's possible to efficiently provide this API for the iteration count
in PBKDF2 as well as for 'p' in scrypt. I think it is also possible for
Argon2.
Even if we decide not to require this API, please consider adding it to
your implementations.
Thanks,
-Taylor
[1] http://doc.libsodium.org/password_hashing/index.html
[2] https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c
46d4a1d37adc1e07/src/libsodium/include/sodium/crypto_pwhash_scryptsalsa
208sha256.h#L28
[3] https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c
46d4a1d37adc1e07/src/libsodium/crypto_pwhash/scryptsalsa208sha256/pwhas
h_scryptsalsa208sha256.c#L18
p.s. Apologies if this has been discussed before. There's a lot on this
list I haven't read.
Sascha Schmidt
2015-10-12 06:57:39 UTC
Permalink
Hi.
I'm not sure if the API you suggested shouldn't be simplified. In the end a
search function mostly caters to developers that are unfamiliar with
password hashing. Anyone else could easily do the search on their own.
While the input and salt may slightly change the runtime for schemes with a
data-dependent memory-access pattern, no one is going to search for the
optimal parameters for every password. Exposing in, out and salt could give
the false impression that they are relevant for the search process or may
even lead to developers using their own passwords for testing.

Sincerely
Sascha Schmidt

P.S.: We have implemented such a search application for Catena:
https://github.com/medsec/catena-axungia
It searches under supplied constraints for time, memory and memory-hardness.
Post by Bill Cox
Some entries (I think including Yescrypt and TwoCats) do something
similar, and provide estimated parameters to meet your time goal. These
tools take a max_memory and target_time parameters. The more typical case
is that we cannot fill as much memory as you would like for a given target
time. so t_cost is set to the minimum value, and an m_cost parameter is
suggested to target the desired runtime. In rare cases, you want to fill
memory and keep rehashing, in which case m_cost is set to max_mem and
t_cost is adjusted accordingly.
I have yet to see an actual deployment making use of more than the minimum
t_cost parameter in any application, so I'm not a fan of that parameter,
but it's there, and both m_cost and t_cost should be suggested for Argon2.
I suspect their team already has something like this in the works. They
are welcome to look at my code in twocats-common.c to see what I did.
Bill
Post by Taylor Hornby
Hi,
I'm not sure if it's the PHC's job to standardize on an API for the
winning hash function, but I'm going to strongly argue that
int PHS_find_t_cost(
void *out, /* same */
size_t outlen, /* same */
const void *in, /* same */
size_t inlen, /* same */
const void *salt, /* same */
size_t saltlen, /* same */
unsigned int target_msec, /* target milliseconds */
unsigned int *t_cost, /* output parameter */
unsigned int m_cost /* same */
);
Calling this API should *efficiently* find the best t_cost for the
given m_cost and target running time, on the current system under its
current workload, while at the same time computing a hash. In other
words, for a fixed m_cost, it should find the smallest valid t_cost
such that the time taken is longer than target_msec milliseconds.
The problem is that on memory-constrained devices, you might want to
use as much memory you can, and then keep computing on that memory for
a certain amount of time, say 5 seconds, when you don't know which
t_cost to use for that amount of time, and you don't want to do trial-
and-error or extrapolation from faster calls.
I think this is an extremely important API to mandate, since if the
developer needs it, and it isn't provided for them, they either have to
do trial-and-error or heuristically guess-then-check t_cost which is
not efficient, OR they have to modify or even re-implement the
password-hashing library, likely losing important optimizations and
thus giving the attacker an advantage in the process.
It will also result higher t_cost being chosen in practice. Developers
can hard-code the time cost in units of seconds, so that t_cost is
always as high as it can be without impacting usability. Without this
API, they'd have to run benchmarks on the system to select t_cost, or
hard-code non-optimal costs, like the libsodium API [1,2,3] does.
It's possible to efficiently provide this API for the iteration count
in PBKDF2 as well as for 'p' in scrypt. I think it is also possible for
Argon2.
Even if we decide not to require this API, please consider adding it to
your implementations.
Thanks,
-Taylor
[1] http://doc.libsodium.org/password_hashing/index.html
[2] https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c
46d4a1d37adc1e07/src/libsodium/include/sodium/crypto_pwhash_scryptsalsa
208sha256.h#L28
<https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c%0D46d4a1d37adc1e07/src/libsodium/include/sodium/crypto_pwhash_scryptsalsa%0D208sha256.h#L28>
[3] https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c
46d4a1d37adc1e07/src/libsodium/crypto_pwhash/scryptsalsa208sha256/pwhas
h_scryptsalsa208sha256.c#L18
<https://github.com/jedisct1/libsodium/blob/e326ef90301cf570a5e6691c%0D46d4a1d37adc1e07/src/libsodium/crypto_pwhash/scryptsalsa208sha256/pwhas%0Dh_scryptsalsa208sha256.c#L18>
p.s. Apologies if this has been discussed before. There's a lot on this
list I haven't read.
Krisztián Pintér
2015-10-12 08:05:10 UTC
Permalink
Post by Taylor Hornby
I'm not sure if it's the PHC's job to standardize on an API for the
winning hash function, but I'm going to strongly argue that
int PHS_find_t_cost(
i was arguing against such an API for the following two reasons:

1, it is not the goal. all too often, we have other considerations how
costly we want the KDF to be, not merely the machine's capabilities.
examples include:

* i except a huge authentication workload on a server, which is hard to emulate.
* i plan to use the same password from other, low end clients.
* i have highly variable workload and memory conditions. measurements
are not representative.

simply knowing the situation usually gives a better idea about how
much memory and clock cycles you want to allocate.


2, it is easy to do

in case you really want to do some benchmarks, do it yourself! if your
aim is, say, 100ms, you can do a quick benchmark in approximately
2-300ms of time, maybe a second if you want to play with different
memory settings as well. you need to do it once, so it does not seem
to be problem.

one can also collect performance data on a website. for different
architectures, we need a 5x5 table of of typical memory and time
settings.

Loading...