Discussion:
[PHC] Interest in specification of modular crypt format
Hugo Landau
2015-09-06 16:20:10 UTC
Permalink
Greetings.

Now that a winner has been announced, I wondered if the PHC has any
interest in specifying a modular crypt format to supplement the final
specification for Argon2?

The modular crypt format is used by UNIX-like systems to encode
passwords in system databases (sha256-crypt is "$5$", sha512-crypt is
"$6$" etc.), but is also more widely used as a password storage format.
Specifying the modular crypt format early as part of the specification
itself would avoid any risk of balkanization of a supposed standard and
encourage adoption.

The most exhaustive documentation of the modular crypt format that I am
aware of is that maintained by the Python passlib project:
https://pythonhosted.org/passlib/lib/passlib.hash.html#unix-modular-crypt-hashes

Take note of PBKDF2, which, while a standard, appears to have numerous
different incompatible formats: Passlib's PBKDF2, cta_pbkdf2_sha1,
dlitz_pbkdf2_sha1, etc.
Specifying a format is trivial but may be highly advantageous.

Hugo Landau
Thomas Pornin
2015-09-06 18:39:52 UTC
Permalink
Post by Hugo Landau
Now that a winner has been announced, I wondered if the PHC has any
interest in specifying a modular crypt format to supplement the final
specification for Argon2?
My opinion is that such a specification should really exist, and,
preferably, be included right into the "official specification" (maybe
as an appendix) and into the reference implementation(s) as well.
Lack of a definite, standard format indeed always leads to a plethora
of incompatible formats that cause severe headaches down the line
(e.g. when switching implementations but reusing an existing database
of hashed passwords).

If the Argon2 authors do not have time for that, I can contribute the
specification and code if needed (I have not written anything to that
effect yet for Argon2, but I did for Makwa, so I believe I can do that
job properly).


--Thomas Pornin
Simon Josefsson
2015-09-06 20:02:28 UTC
Permalink
Post by Thomas Pornin
Post by Hugo Landau
Now that a winner has been announced, I wondered if the PHC has any
interest in specifying a modular crypt format to supplement the final
specification for Argon2?
My opinion is that such a specification should really exist, and,
preferably, be included right into the "official specification" (maybe
as an appendix) and into the reference implementation(s) as well.
Lack of a definite, standard format indeed always leads to a plethora
of incompatible formats that cause severe headaches down the line
(e.g. when switching implementations but reusing an existing database
of hashed passwords).
If the Argon2 authors do not have time for that, I can contribute the
specification and code if needed (I have not written anything to that
effect yet for Argon2, but I did for Makwa, so I believe I can do that
job properly).
I worked on this for scrypt, see

https://gitlab.com/jas/scrypt-unix-crypt/blob/master/unix-scrypt.txt

and I am interested in working on this for Argon2 too.

I don't believe it is important to include this in the official
specification. It should be fine to keep it in a separate document, and
for a disjoint, or only partially overlapping group of people, to work
on that project. I do agree that a plethora of incompatible formats is
a severe pain, but if a number of people now agree on a writeup and
starts to experiment, I believe we can get closure on something that
should be "good enough" for others to accept. That said, I'm not
opposed to including things in the official specification, if consensus
on details can be established.

/Simon
Solar Designer
2015-09-07 10:55:30 UTC
Permalink
Post by Simon Josefsson
Post by Thomas Pornin
If the Argon2 authors do not have time for that, I can contribute the
specification and code if needed (I have not written anything to that
effect yet for Argon2, but I did for Makwa, so I believe I can do that
job properly).
I worked on this for scrypt, see
https://gitlab.com/jas/scrypt-unix-crypt/blob/master/unix-scrypt.txt
and I am interested in working on this for Argon2 too.
I briefly contributed to that too, and I've got additional thoughts on
this bikeshed since then. Specifically:

The syntax we introduced for scrypt at the time turned out not to
accommodate some other encodings people came up with (some may have been
already in use when we came up with ours), whereas it would have been
nice to be able to convert any unofficial encoding to/from the official
one. The issue is that, for implementation and specification
simplicity, I chose to treat the salt substring verbatim, passing it
right into scrypt with no decoding. Some other people chose to generate
the scrypt salt as arbitrary binary data and to encode it. Thus, when
converting their encodings to the one described at the URL above, we may
get bad (non-printable or otherwise special) characters in the salt
substring, making such conversion unusable in general. So maybe we need
to use encoded salts, even though this requires us to decode them in
implementations and to decide on how we handle incorrect salt encodings
(e.g., characters outside of the base64 alphabet).

If we continue to build upon the traditional crypt(3) alphabet and
encoding (like the above specification for scrypt does), then I'd also
like to include a way to represent the numeric parameters in a more
compact form: let the little-endian encodings be truncated when the
values are small. So e.g. instead of:

$7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

it would be valid to use:

$7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

or if we want to keep the number of '$' characters fixed, then either
require the '$' delimiters to always be present or maybe introduce '+'
or '=' for truncation (a character already used in the other base64
encoding's alphabet, so likely safe to use):

$7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

The savings for classic scrypt are small, but with yescrypt's additional
parameters they are more significant.

(These examples don't yet switch to encoded salts. I am keeping these
two topics separate in this message, although obviously if we implement
both changes then we'd need to merge them in some way.)

There are also some good reasons in favor of using the RFC specified
base64 encoding, but that's a deviation from what was used traditionally
and unfortunately it appears incompatible with the truncation feature
(and moreover it has padding characters, resulting in even longer
strings).

What do others think on all of this? Thomas?
Post by Simon Josefsson
I don't believe it is important to include this in the official
specification. It should be fine to keep it in a separate document, and
for a disjoint, or only partially overlapping group of people, to work
on that project. I do agree that a plethora of incompatible formats is
a severe pain, but if a number of people now agree on a writeup and
starts to experiment, I believe we can get closure on something that
should be "good enough" for others to accept. That said, I'm not
opposed to including things in the official specification, if consensus
on details can be established.
I think it's desirable for us to arrive at a common approach at encoding
these things for all PHC-endorsed schemes. It is less important whether
this would be part of the specifications for the individual hashing
schemes or not, although I think that if not right away then it should
retroactively be added to revisions of those later.

Alexander
Darren J Moffat
2015-09-07 11:41:42 UTC
Permalink
Post by Solar Designer
If we continue to build upon the traditional crypt(3) alphabet and
encoding (like the above specification for scrypt does), then I'd also
like to include a way to represent the numeric parameters in a more
compact form: let the little-endian encodings be truncated when the
$7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
$7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
or if we want to keep the number of '$' characters fixed, then either
require the '$' delimiters to always be present or maybe introduce '+'
or '=' for truncation (a character already used in the other base64
$7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
The savings for classic scrypt are small, but with yescrypt's additional
parameters they are more significant.
In Solaris crypt(3C) we use this syntax:

If the first character of salt is "$", crypt() uses crypt.conf(4) to
determine which shared module to load for the encryption algorithm.
The algorithm name crypt() uses to search in crypt.conf is the string
between the first and second "$", or between the first "$" and first
"," if a "," comes before the second "$".

Basically ',' is the argument separator so you can end up with things
like this:

$5,rounds=6000$nlxmTTpz$

So you could have

$7,C6$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
Post by Solar Designer
I think it's desirable for us to arrive at a common approach at encoding
these things for all PHC-endorsed schemes. It is less important whether
this would be part of the specifications for the individual hashing
schemes or not, although I think that if not right away then it should
retroactively be added to revisions of those later.
I'd like to be sure that when we add Argon2 support for UNIX user
password hashes in Solaris that the syntax will be compatible with what
Linux vendors add. It is also important that we have a Python
implementation since parts of OpenStack can be involved in setting the
password hashes at deployment time.
--
Darren J Moffat
Solar Designer
2015-09-07 15:03:03 UTC
Permalink
Post by Darren J Moffat
Post by Solar Designer
If we continue to build upon the traditional crypt(3) alphabet and
encoding (like the above specification for scrypt does), then I'd also
like to include a way to represent the numeric parameters in a more
compact form: let the little-endian encodings be truncated when the
$7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
$7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
or if we want to keep the number of '$' characters fixed, then either
require the '$' delimiters to always be present or maybe introduce '+'
or '=' for truncation (a character already used in the other base64
$7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
The savings for classic scrypt are small, but with yescrypt's additional
parameters they are more significant.
If the first character of salt is "$", crypt() uses crypt.conf(4) to
determine which shared module to load for the encryption algorithm.
The algorithm name crypt() uses to search in crypt.conf is the string
between the first and second "$", or between the first "$" and first
"," if a "," comes before the second "$".
Basically ',' is the argument separator so you can end up with things
$5,rounds=6000$nlxmTTpz$
So you could have
$7,C6$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
Oh, this is actually consistent with what I just wrote in the previous
message: use another delimiter between parameters, so that some may be
omitted and defaults used. In the case of Solaris, it's the exact same
thing with being able to omit the rounds specification, right?

So maybe:

$7,14,8$base64salt$base64hash

which would mean classic scrypt and imply p=1, or:

$7,14,8,16$base64salt$base64hash

for p=16, etc. And yescrypt extras would then be in the form of
changing the initial "7" e.g. to "7a" and a few others (to encode the
flags right there), and optionally listing extra decimal numbers after
scrypt's p.

For yescrypt, part of the question is whether we want to invent a new
encoding for scrypt while at it, or whether we want to build upon one
that is already in use.

Personally, I like crypt's base64 and I like to use it for parameters as
well, but I recognize there are valid reasons for RFC base64 and decimal.

Alexander
Solar Designer
2015-09-25 01:32:01 UTC
Permalink
Post by Solar Designer
If we continue to build upon the traditional crypt(3) alphabet and
encoding (like the above specification for scrypt does), then I'd also
like to include a way to represent the numeric parameters in a more
compact form: let the little-endian encodings be truncated when the
$7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
$7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
or if we want to keep the number of '$' characters fixed, then either
require the '$' delimiters to always be present or maybe introduce '+'
or '=' for truncation (a character already used in the other base64
$7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
The savings for classic scrypt are small, but with yescrypt's additional
parameters they are more significant.
BTW, an even more compact encoding of numeric parameters is possible,
through having the first B64 character of every number encode whether
and how many additional characters are expected for that number. Small
numbers e.g. up to 31 or 47 would be encoded with just one B64 character
and no other character needed to indicate this short encoding length.
This would be similar to how UTF-8 works, with 7-bit ASCII only needing
one octet per character. And we would get similar tradeoffs, too. ;-(
We would need to decide whether overlong encodings would be possible
(and their use would then be forbidden in the spec, much like Thomas'
current spec forbids leading zeroes in decimal numbers) or whether there
would be only one way to encode a given number without us having to
explicitly forbid anything. We can do the latter by starting each of
the larger sub-ranges right where the previous sub-range ended:

For example, by value of first B64 char:

0 to 47 - 1 char, range 0 to 47
48 to 55 - 2 chars, range 48 to 559 (using 3+6 bits of the two chars)
56 to 59 - 3 chars, range 560 to 16943 (2+12 bits)
60 to 61 - 4 chars, range 16944 to 541231 (1+18 bits)
62 - 5 chars, range 541232 to 17318447 (0+24 bits)
63 - 6 chars, range 17318448 to 1091060271 (0+30 bits)

This unnecessarily(?) allows going 1.6% higher than 2^30, which would
have been the desired maximum for (ye)scrypt's numeric parameters, but
on the bright side the non-overlapping ranges result in there being only
1 way to encode a given number, and some encodings are shorter (e.g. for
512 to 559, which would otherwise require 3 chars). If a wider range
needs to be encoded, we can limit the 1-char range e.g. to 0 to 31 (and
proceed to up to 7 chars and 36+ bits), but I actually wanted to avoid
the risk of ever hitting 2^31 here (would be risky if implemented with
signed int somewhere, e.g. in a scripting language that uses signed
32-bit int behind the scenes).

Is this unjustified complexity or is it a good way to achieve several
goals at once (very compact encoding, no two ways to encode a number,
use of only the B64 alphabet with no need for an "end of number" char,
no way to hit 2^31)?

Is there an existing encoding like this that we could reuse, or would we
have to invent our own like I have above?

With this encoding, my example above could become simply:

$7X$C6/SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

because both '6' (for r=8) and '/' (for p=1) are small enough that they
fall into the 0 to 47 range needing only 1 character. (I put an "X"
here after the "$7" to indicate that we'd need to somehow alter the
prefix if we implement this for scrypt in particular, since it wouldn't
otherwise be detectable. And we'd like change other things at once.
By "X" I mean eXperimental - just a placeholder I use until we've
decided on an encoding.)

Of course, we can also introduce a '$' before the actual salt value
artificially if we want to, such as to allow a trivial parser (without
decoding of the numeric parameters) to separate the parameters from the
salt value. Then we'd have something like:

$7X$C6/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

Also, if we like, even the initial log2(N), which was 1 char already,
could be treated as encoded using this scheme, for consistency. For
practical purposes, it'd remain 1 char anyway (since 2^47 is a lot), and
values 48 or higher would be invalid anyway (2^48 is way too high) no
matter which encoding.

Alexander
Solar Designer
2015-09-25 02:17:33 UTC
Permalink
Post by Solar Designer
0 to 47 - 1 char, range 0 to 47
48 to 55 - 2 chars, range 48 to 559 (using 3+6 bits of the two chars)
56 to 59 - 3 chars, range 560 to 16943 (2+12 bits)
60 to 61 - 4 chars, range 16944 to 541231 (1+18 bits)
62 - 5 chars, range 541232 to 17318447 (0+24 bits)
63 - 6 chars, range 17318448 to 1091060271 (0+30 bits)
I've attached a program to calculate the ranges for 4 kinds of encodings
like this (varying by the single-char range, which in turn changes the
rest), and its output. Out of those 4, I still think the one above is
most suitable, unless we actually want to be able to go 2^31 or higher.

Alexander
Solar Designer
2015-09-26 02:06:00 UTC
Permalink
Post by Solar Designer
0 to 47 - 1 char, range 0 to 47
48 to 55 - 2 chars, range 48 to 559 (using 3+6 bits of the two chars)
56 to 59 - 3 chars, range 560 to 16943 (2+12 bits)
60 to 61 - 4 chars, range 16944 to 541231 (1+18 bits)
62 - 5 chars, range 541232 to 17318447 (0+24 bits)
63 - 6 chars, range 17318448 to 1091060271 (0+30 bits)
Attached is an implementation of the above. There are two functions:

char *encode64_uint32(char *dst, size_t dstlen, uint32_t src);
const char *decode64_uint32(uint32_t *dst, const char *src);

encode64_uint32() encodes the number in src into the string pointed to
by dst, and returns a pointer to right after the encoded number.

decode64_uint32() decodes the number in the string src into dst, and
returns a pointer to right after the encoded number.

Both functions return NULL on error.

As you can see, the encoding/decoding functions are quite simple.

The program also includes unit tests for these functions, including
testing them exhaustively on the 32-bit range (yes, including beyond the
expected valid range, making sure the failures outside of that range are
as expected). All of these tests pass for me. (I should also add tests
of decode64_uint32() on invalid inputs.)

Build and run the program with:

gcc sim-encode.c -o sim-encode -s -O2 -march=native -fopenmp -Wall && ./sim-encode

The output for a handful of special values is:

0 .
47 j
48 k.
559 rz
560 s..
16943 vzz
16944 w...
541231 xzzz
541232 y....
17318447 yzzzz
17318448 z.....
1091060271 zzzzzz
1091060272 couldn't encode
2147483647 couldn't encode
2147483648 couldn't encode
4294967295 couldn't encode

I chose to use a big-endian order, which felt consistent with use of the
first character to specify the magnitude of the number.

Any comments?

Alexander
Solar Designer
2015-09-26 15:25:13 UTC
Permalink
Post by Solar Designer
char *encode64_uint32(char *dst, size_t dstlen, uint32_t src);
const char *decode64_uint32(uint32_t *dst, const char *src);
encode64_uint32() encodes the number in src into the string pointed to
by dst, and returns a pointer to right after the encoded number.
decode64_uint32() decodes the number in the string src into dst, and
returns a pointer to right after the encoded number.
The intent is that we can easily call these functions multiple times in
a row, to encode/decode multiple numbers.
Post by Solar Designer
As you can see, the encoding/decoding functions are quite simple.
Attached is a slightly shorter version: I removed the "base" variable,
and instead update src or *dst (when encoding or decoding) directly.
Perhaps this can be shortened further, but this wouldn't necessarily
make it simpler.
Post by Solar Designer
gcc sim-encode.c -o sim-encode -s -O2 -march=native -fopenmp -Wall && ./sim-encode
Alexander
Alexander Cherepanov
2015-09-27 12:55:52 UTC
Permalink
Post by Solar Designer
Post by Solar Designer
If we continue to build upon the traditional crypt(3) alphabet and
encoding (like the above specification for scrypt does), then I'd also
like to include a way to represent the numeric parameters in a more
compact form: let the little-endian encodings be truncated when the
$7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
$7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
or if we want to keep the number of '$' characters fixed, then either
require the '$' delimiters to always be present or maybe introduce '+'
or '=' for truncation (a character already used in the other base64
$7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
The savings for classic scrypt are small, but with yescrypt's additional
parameters they are more significant.
BTW, an even more compact encoding of numeric parameters is possible,
through having the first B64 character of every number encode whether
and how many additional characters are expected for that number. Small
numbers e.g. up to 31 or 47 would be encoded with just one B64 character
and no other character needed to indicate this short encoding length.
This would be similar to how UTF-8 works, with 7-bit ASCII only needing
one octet per character. And we would get similar tradeoffs, too. ;-(
We would need to decide whether overlong encodings would be possible
(and their use would then be forbidden in the spec, much like Thomas'
current spec forbids leading zeroes in decimal numbers) or whether there
would be only one way to encode a given number without us having to
explicitly forbid anything. We can do the latter by starting each of
0 to 47 - 1 char, range 0 to 47
48 to 55 - 2 chars, range 48 to 559 (using 3+6 bits of the two chars)
56 to 59 - 3 chars, range 560 to 16943 (2+12 bits)
60 to 61 - 4 chars, range 16944 to 541231 (1+18 bits)
62 - 5 chars, range 541232 to 17318447 (0+24 bits)
63 - 6 chars, range 17318448 to 1091060271 (0+30 bits)
This unnecessarily(?) allows going 1.6% higher than 2^30, which would
have been the desired maximum for (ye)scrypt's numeric parameters, but
on the bright side the non-overlapping ranges result in there being only
1 way to encode a given number, and some encodings are shorter (e.g. for
512 to 559, which would otherwise require 3 chars). If a wider range
needs to be encoded, we can limit the 1-char range e.g. to 0 to 31 (and
proceed to up to 7 chars and 36+ bits), but I actually wanted to avoid
the risk of ever hitting 2^31 here (would be risky if implemented with
signed int somewhere, e.g. in a scripting language that uses signed
32-bit int behind the scenes).
Is this unjustified complexity or is it a good way to achieve several
goals at once (very compact encoding, no two ways to encode a number,
use of only the B64 alphabet with no need for an "end of number" char,
no way to hit 2^31)?
The problem with this approach is that the encoding is closely tied to
the range of allowed values. Argon2 spec says that the range for p is
1..255 while m and t can go upto 2^32-1. Should they use the same
encoding? Should it be the same as for (ye)scrypt?

Relatively widely used encoding for similar cases is Variable-length
quantity[1]. There is even "Base64 VLQ" in Source Map[2] but it encodes
a sign in a least significant bit which could inconvenient if we want to
encode unsigned integers only.

[1] https://en.wikipedia.org/wiki/Variable-length_quantity
[2]
https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k
--
Alexander Cherepanov
Solar Designer
2015-10-11 01:28:15 UTC
Permalink
Post by Alexander Cherepanov
Post by Solar Designer
Is this unjustified complexity or is it a good way to achieve several
goals at once (very compact encoding, no two ways to encode a number,
use of only the B64 alphabet with no need for an "end of number" char,
no way to hit 2^31)?
The problem with this approach is that the encoding is closely tied to
the range of allowed values. Argon2 spec says that the range for p is
1..255 while m and t can go upto 2^32-1. Should they use the same
encoding? Should it be the same as for (ye)scrypt?
Yeah, having the encoding tied to the range of allowed values is both
good and bad. As an option, Argon2 specifically could choose to limit
its parameters to 2^30. (No, I am not suggesting this currently.)
Post by Alexander Cherepanov
Relatively widely used encoding for similar cases is Variable-length
quantity[1]. There is even "Base64 VLQ" in Source Map[2] but it encodes
a sign in a least significant bit which could inconvenient if we want to
encode unsigned integers only.
[1] https://en.wikipedia.org/wiki/Variable-length_quantity
[2]
https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k
Thank you! Besides the sign bit issue, this existing "Base 64 VLQ"
appears to allow for trailing chars with all-zero data bytes, which
wouldn't alter the value (since it's a little-endian encoding).
So it does not force deterministic encoding.

"Base 64 VLQ
The VLQ is a Base64 value, where the most significant bit (the
6th bit) is used as the continuation bit, and the "digits" are encoded
into the string least significant first, and where the least significant
bit of the first digit is used as the sign bit. Note: The values that
can be represent by the VLQ Base64 encoded are limited to 32 bit
quantities until some use case for larger values is presented."

I just found this:

"BASE64 VLQ CODEC (COder/DECoder) AND SOURCEMAP V3 MAPPINGS PARSER"
http://murzwin.com/base64vlq.html

and experimented with it. There are many ways to encode 0:

A
B
gA
ggA
gggA

as well as non-zero numbers, e.g. all of these decode to 16:

gB
ghA
ghgA

and all of these to 17:

iB
ihA
ihgA

Also, there exist invalid encodings that a robust parser will need to
reject. "Invalid VLQ64 encoding, continuation bit set at last character
while decoding" as that JavaScript parser says e.g. for "g" and "gg".

So I am not happy about this specific encoding, although it's something
we could use if we were to use the combination of RFC Base64 charset and
compact encoding for numbers, which as it appears we're not going to.

Alexander

Anthony Ferrara
2015-09-27 13:40:50 UTC
Permalink
All,
Post by Solar Designer
Post by Solar Designer
If we continue to build upon the traditional crypt(3) alphabet and
encoding (like the above specification for scrypt does), then I'd also
like to include a way to represent the numeric parameters in a more
compact form: let the little-endian encodings be truncated when the
$7$C6..../....SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
$7$C6$/$SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
or if we want to keep the number of '$' characters fixed, then either
require the '$' delimiters to always be present or maybe introduce '+'
or '=' for truncation (a character already used in the other base64
$7$C6+/+SodiumChloride$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
The savings for classic scrypt are small, but with yescrypt's additional
parameters they are more significant.
BTW, an even more compact encoding of numeric parameters is possible,
Please, can the encoding be kept human-readable? Going with one-off
encodings does look elegant, but it makes the implementation
significantly more error prone.

People are even confused by the fact that the salt-boundary in bcrypt
occurs mid-character (leading to only 4 possible character values for
the last salt character).

Instead, can we focus on readability? For parameters that have well
defined defaults, simply don't require that parameter to be specified.
For those that don't have well defined defaults, then always require
them. Yes, it is a little bit more parsing overhead on the crypt()
implementation, but it greatly reduces the overhead on the calling
code. Especially for those building wrappers (which happens all the
time).

I do like the approach of specifically identifying parameter that was
brought up in the other thread. Something along the lines for scrypt:

$7$n=14;r=8;p=1$ALotOfSalt$Hash

But since p=1 could reasonably be default, the following is identical:

$7$n=14;r=8$ALotOfSalt$Hash

Whether a parameter is power-of-2 or base 10 would depend on the
algorithm. Also, please don't define the parameter order as rigid.
It's simple enough to implement arbitrary order, so why put arbitrary
restrictions?

So my overall suggestion is make it easier on the developer using the
API, rather than the one writing it. After all, there are going to be
FAR more people writing a salt string then there will be writing the
backend implementation. Optimize for the greater use-case...

Just my $0.02

Anthony
Solar Designer
2015-09-27 18:52:14 UTC
Permalink
Hi Anthony,

Thank you for sharing your opinion.
Post by Anthony Ferrara
Please, can the encoding be kept human-readable?
Maybe. At least I've counted your vote. ;-)
Post by Anthony Ferrara
Going with one-off encodings does look elegant,
What do you mean by one-off here? That no other hashing scheme would
use the same encoding? Ideally, we'd come up with an encoding, whether
human-readable or not, that multiple schemes would adopt.
Post by Anthony Ferrara
but it makes the implementation significantly more error prone.
Does it? Maybe that depends on the kind of errors you care about vs.
disregard. In my proposed numeric encoding, I am focusing on having
implementation errors detectable in normal encoding/decoding of valid
inputs, or being impossible (e.g., can't overflow an int, can't encode
the same parameter more than once). With a human-readable encoding like
Thomas', there are less critical potential errors that, on the other
hand, would tend to go undetected for long. Even if the spec has lots
of MUST's and MUST NOT's, implementations will end up parsing strings
like p=1,t=2,p=3 (is it p=1 or p=3, then? parsing will vary between
implementations, and only a handful will actually reject such strings).
Post by Anthony Ferrara
People are even confused by the fact that the salt-boundary in bcrypt
occurs mid-character (leading to only 4 possible character values for
the last salt character).
Yes, and I have no good solution for that one (short of the workaround
of using multiples of 6 for bit sizes of salts and hashes, or for
salts+hashes combined as Alexander Cherepanov mentioned to me off-list).

So it's a problem common for compact and verbose encodings, and not a
reason to choose one over the other.
Post by Anthony Ferrara
Instead, can we focus on readability?
For parameters that have well
defined defaults, simply don't require that parameter to be specified.
This means the encoding is not deterministic. A parameter can be
omitted, or it can be explicitly included with its default value.

That's bad. There's value in having canonical encodings, and in there
being only one canonical encoding for a given set of parameters, salt,
and hash.

If we change "don't require that parameter to be specified" to "require
that parameter not to be specified", that's better, but then many
parsers will not actually enforce this rule. Maybe that's a minor
enough issue to accept it. (The issue with multiple specification of a
parameter is worse.)
Post by Anthony Ferrara
For those that don't have well defined defaults, then always require
them. Yes, it is a little bit more parsing overhead on the crypt()
implementation, but it greatly reduces the overhead on the calling
code. Especially for those building wrappers (which happens all the
time).
I do like the approach of specifically identifying parameter that was
$7$n=14;r=8;p=1$ALotOfSalt$Hash
$7$n=14;r=8$ALotOfSalt$Hash
Whether a parameter is power-of-2 or base 10 would depend on the
algorithm. Also, please don't define the parameter order as rigid.
It's simple enough to implement arbitrary order, so why put arbitrary
restrictions?
Deterministic canonical encoding. Side-stepping the issue of multiple
specification of a parameter (as long as the parser actually rejects
encodings with parameters appearing in unexpected order).
Post by Anthony Ferrara
So my overall suggestion is make it easier on the developer using the
API, rather than the one writing it. After all, there are going to be
FAR more people writing a salt string then there will be writing the
backend implementation. Optimize for the greater use-case...
This makes sense to me. We will need to provide an API for generating
salt strings (or "setting strings", as they're called for bsdicrypt and
on) anyway, but you have a valid point that, especially if the encoding
is simple for an application developer to understand, many developers
will be generating such strings on their own.

For PHP and such, I think this pretty much implies the encoding will
also need to use RFC Base64. Then it's just sprintf of some decimal
numbers and some base64()'s. So my point remains that Thomas' proposed
encoding is neither here nor there.

Thanks again,

Alexander
Anthony Ferrara
2015-09-27 20:46:48 UTC
Permalink
Alexander,
Post by Solar Designer
Hi Anthony,
Thank you for sharing your opinion.
Post by Anthony Ferrara
Please, can the encoding be kept human-readable?
Maybe. At least I've counted your vote. ;-)
Post by Anthony Ferrara
Going with one-off encodings does look elegant,
What do you mean by one-off here? That no other hashing scheme would
use the same encoding? Ideally, we'd come up with an encoding, whether
human-readable or not, that multiple schemes would adopt.
Unless I misread, it seemed like there was a suggestion to change the
encoding per parameter. Parameters that have no valid 0 would be encoded
where the first representable value is 1.

If that was not the case, then never mind.
Post by Solar Designer
Post by Anthony Ferrara
but it makes the implementation significantly more error prone.
Does it? Maybe that depends on the kind of errors you care about vs.
disregard. In my proposed numeric encoding, I am focusing on having
implementation errors detectable in normal encoding/decoding of valid
inputs, or being impossible (e.g., can't overflow an int, can't encode
the same parameter more than once). With a human-readable encoding like
Thomas', there are less critical potential errors that, on the other
hand, would tend to go undetected for long. Even if the spec has lots
of MUST's and MUST NOT's, implementations will end up parsing strings
like p=1,t=2,p=3 (is it p=1 or p=3, then? parsing will vary between
implementations, and only a handful will actually reject such strings).
I would say that this is simply an error and the spec should define it as
such. A few test vectors for the error case would be good there.
Post by Solar Designer
Post by Anthony Ferrara
People are even confused by the fact that the salt-boundary in bcrypt
occurs mid-character (leading to only 4 possible character values for
the last salt character).
Yes, and I have no good solution for that one (short of the workaround
of using multiples of 6 for bit sizes of salts and hashes, or for
salts+hashes combined as Alexander Cherepanov mentioned to me off-list).
For the record, I am not saying that problem should be "solved". Just to
keep in mind that making things more clever and more efficient on packing
is fine, but it may make the generation of the sale less intuitive.
Something to keep in mind.
Post by Solar Designer
So it's a problem common for compact and verbose encodings, and not a
reason to choose one over the other.
Post by Anthony Ferrara
Instead, can we focus on readability?
For parameters that have well
defined defaults, simply don't require that parameter to be specified.
This means the encoding is not deterministic. A parameter can be
omitted, or it can be explicitly included with its default value.
That's bad. There's value in having canonical encodings, and in there
being only one canonical encoding for a given set of parameters, salt,
and hash.
If we change "don't require that parameter to be specified" to "require
that parameter not to be specified", that's better, but then many
parsers will not actually enforce this rule. Maybe that's a minor
enough issue to accept it. (The issue with multiple specification of a
parameter is worse.)
Fair enough. Though I don't know which I would rather weight for. My
instinct is to make the implementation more difficult to make the lives of
the users easier. But that is not a strict tradeoff. More of something to
keep in mind as a rule of thumb.
Post by Solar Designer
Post by Anthony Ferrara
For those that don't have well defined defaults, then always require
them. Yes, it is a little bit more parsing overhead on the crypt()
implementation, but it greatly reduces the overhead on the calling
code. Especially for those building wrappers (which happens all the
time).
I do like the approach of specifically identifying parameter that was
$7$n=14;r=8;p=1$ALotOfSalt$Hash
$7$n=14;r=8$ALotOfSalt$Hash
Whether a parameter is power-of-2 or base 10 would depend on the
algorithm. Also, please don't define the parameter order as rigid.
It's simple enough to implement arbitrary order, so why put arbitrary
restrictions?
Deterministic canonical encoding. Side-stepping the issue of multiple
specification of a parameter (as long as the parser actually rejects
encodings with parameters appearing in unexpected order).
Post by Anthony Ferrara
So my overall suggestion is make it easier on the developer using the
API, rather than the one writing it. After all, there are going to be
FAR more people writing a salt string then there will be writing the
backend implementation. Optimize for the greater use-case...
This makes sense to me. We will need to provide an API for generating
salt strings (or "setting strings", as they're called for bsdicrypt and
on) anyway, but you have a valid point that, especially if the encoding
is simple for an application developer to understand, many developers
will be generating such strings on their own.
For PHP and such, I think this pretty much implies the encoding will
also need to use RFC Base64. Then it's just sprintf of some decimal
numbers and some base64()'s. So my point remains that Thomas' proposed
encoding is neither here nor there.
Completely fair. I am less concerned around the encoding of the salt than
the encoding and specification of the parameters. If a high level api is
provided for generating the salt, then awesome. That helps a lot. Though I
would still suggest keeping it readable.

There are a ton of tradeoffs involved, and I appreciate what this team is
doing. Whatever the end result is, it will be adopted.

So thanks!

Anthony
Post by Solar Designer
Thanks again,
Alexander
Solar Designer
2015-09-27 22:15:15 UTC
Permalink
Post by Anthony Ferrara
Post by Solar Designer
Post by Anthony Ferrara
Going with one-off encodings does look elegant,
What do you mean by one-off here? That no other hashing scheme would
use the same encoding? Ideally, we'd come up with an encoding, whether
human-readable or not, that multiple schemes would adopt.
Unless I misread, it seemed like there was a suggestion to change the
encoding per parameter. Parameters that have no valid 0 would be encoded
where the first representable value is 1.
Oh, so by one-off you meant off-by-one.

Yes, that was one of my suggestions to be considered on top of the
compact B64 encoding for numeric parameters and especially if we allow
for defaults. However, this isn't exactly changing the encoding per
parameter - this is having per parameter minimums that must be passed
into the encoding and decoding functions for individual numeric
parameters. Robust implementations need such minimums anyway: for
checking that parameters are valid. With this proposal, the minimums
happen to be enforced automatically, since there would simply be no way
to encode a value below the minimum for each given parameter. Also,
deterministic encoding is achieved despite of allowing for defaults.

So you say this "makes the implementation significantly more error
prone". Can you explain? A possible error could be using an incorrect
minimum on a parameter, but this would fail the very first test (except
if the parameter is omitted and default value used in a given case)
since none of the possible values would be interpreted correctly.
Doesn't sound error prone to me, unless testing isn't performed at all.

I am not suggesting this for a human-readable encoding, as it goes
against the human-readable goal. I am only suggesting this for use
along with a compact encoding.
Post by Anthony Ferrara
Post by Solar Designer
Post by Anthony Ferrara
So my overall suggestion is make it easier on the developer using the
API, rather than the one writing it. After all, there are going to be
FAR more people writing a salt string then there will be writing the
backend implementation. Optimize for the greater use-case...
This makes sense to me. We will need to provide an API for generating
salt strings (or "setting strings", as they're called for bsdicrypt and
on) anyway, but you have a valid point that, especially if the encoding
is simple for an application developer to understand, many developers
will be generating such strings on their own.
For PHP and such, I think this pretty much implies the encoding will
also need to use RFC Base64. Then it's just sprintf of some decimal
numbers and some base64()'s. So my point remains that Thomas' proposed
encoding is neither here nor there.
Completely fair. I am less concerned around the encoding of the salt than
the encoding and specification of the parameters. If a high level api is
provided for generating the salt, then awesome. That helps a lot. Though I
would still suggest keeping it readable.
Regarding crypt B64 vs. RFC Base64, Alexander Cherepanov has just
pointed out to me that there are in fact major issues with common
base64 decoders. Thomas had hinted at that, but I didn't realize just
how bad things actually were. So I withdraw that suggestion for now.

Alexander
Solar Designer
2015-10-04 09:00:03 UTC
Permalink
Post by Solar Designer
Post by Anthony Ferrara
Post by Solar Designer
For PHP and such, I think this pretty much implies the encoding will
also need to use RFC Base64. Then it's just sprintf of some decimal
numbers and some base64()'s. So my point remains that Thomas' proposed
encoding is neither here nor there.
Completely fair. I am less concerned around the encoding of the salt than
the encoding and specification of the parameters. If a high level api is
provided for generating the salt, then awesome. That helps a lot. Though I
would still suggest keeping it readable.
Regarding crypt B64 vs. RFC Base64, Alexander Cherepanov has just
pointed out to me that there are in fact major issues with common
base64 decoders.
Alexander Cherepanov has since posted about it in here.
Post by Solar Designer
Thomas had hinted at that, but I didn't realize just
how bad things actually were. So I withdraw that suggestion for now.
So it looks like we've chosen to use a verbose encoding:

https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md

and it looks like we've switched to (almost) RFC Base64:

"The B64 encoding is the standard Base64 encoding (RFC 4648, section 4)
except that the padding = signs are omitted, and extra characters
(whitespace) are not allowed"

As much as I am for compact encodings, I think if we chose to go verbose
anyway, we should not omit the padding '=' signs. Just to make it
easier to use standard encoding/decoding functions.

A reason to omit them is to ensure that '=' are only seen between
parameter names and values, but this is not necessary for parsing.
Even something like this may be parsed unambiguously:

$id$param1=123,param2=B64here==,param3=B64there=$salt==$hash=

It's not pretty, but it's simpler to create and parse when you already
have base64 encoding/decoding functions. Unfortunately, those decoding
functions on their own typically won't ensure strict syntax (they're
about as broken as strtoul() in this respect, or maybe slightly worse
even), but that may be a reason for us not to choose RFC Base64 rather
than to omit padding, I think.

Alexander
Jean-Philippe Aumasson
2015-10-04 09:06:32 UTC
Permalink
IMHO compatibility with b64 en/decoders is worth the extra ='s, and
sticking to the RFC probably simplifies the specs. But I don't have a
strong opinion on that subject.
Post by Dmitry Khovratovich
Post by Solar Designer
Post by Anthony Ferrara
Post by Solar Designer
For PHP and such, I think this pretty much implies the encoding will
also need to use RFC Base64. Then it's just sprintf of some decimal
numbers and some base64()'s. So my point remains that Thomas'
proposed
Post by Solar Designer
Post by Anthony Ferrara
Post by Solar Designer
encoding is neither here nor there.
Completely fair. I am less concerned around the encoding of the salt
than
Post by Solar Designer
Post by Anthony Ferrara
the encoding and specification of the parameters. If a high level api
is
Post by Solar Designer
Post by Anthony Ferrara
provided for generating the salt, then awesome. That helps a lot.
Though I
Post by Solar Designer
Post by Anthony Ferrara
would still suggest keeping it readable.
Regarding crypt B64 vs. RFC Base64, Alexander Cherepanov has just
pointed out to me that there are in fact major issues with common
base64 decoders.
Alexander Cherepanov has since posted about it in here.
Post by Solar Designer
Thomas had hinted at that, but I didn't realize just
how bad things actually were. So I withdraw that suggestion for now.
https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md
"The B64 encoding is the standard Base64 encoding (RFC 4648, section 4)
except that the padding = signs are omitted, and extra characters
(whitespace) are not allowed"
As much as I am for compact encodings, I think if we chose to go verbose
anyway, we should not omit the padding '=' signs. Just to make it
easier to use standard encoding/decoding functions.
A reason to omit them is to ensure that '=' are only seen between
parameter names and values, but this is not necessary for parsing.
$id$param1=123,param2=B64here==,param3=B64there=$salt==$hash=
It's not pretty, but it's simpler to create and parse when you already
have base64 encoding/decoding functions. Unfortunately, those decoding
functions on their own typically won't ensure strict syntax (they're
about as broken as strtoul() in this respect, or maybe slightly worse
even), but that may be a reason for us not to choose RFC Base64 rather
than to omit padding, I think.
Alexander
Solar Designer
2015-09-27 22:47:48 UTC
Permalink
Post by Anthony Ferrara
So my overall suggestion is make it easier on the developer using the
API, rather than the one writing it. After all, there are going to be
FAR more people writing a salt string then there will be writing the
backend implementation. Optimize for the greater use-case...
Here's an idea in the above context:

We can have our crypt()-like API accept both compact and human-friendly
"setting strings", but output only compact encodings. This takes care
of making it easy for application developers to generate new hashes via
languages' existing crypt() API without waiting for a new API to be
introduced. It also prevents such application developers from
influencing the actual encoding (order of parameters seen in final
encodings, etc.) For example:

crypt("password", "$7X$logN=14,r=8")

as well as e.g.:

crypt("password", "$7X$p=1,r=8,N=16384")

could return something looking like:

$7X$B5$Pwm/zQAIhEVTKlaoJSA7TQ$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D

Of course, crypt() would also return the same string if called as:

crypt("password", "$7X$B5$Pwm/zQAIhEVTKlaoJSA7TQ")

With PHP's password_hash()/password_verify() API, the human-friendly
parsing would need to be enabled in password_hash(), but not
(necessarily) in password_verify(). Currently password_hash() accepts a
numeric $algo, but maybe we simply need to introduce PASSWORD_ANY (any
better name for it?) and have the actual choice made by a $setting
string that will follow.

Then we could also have an API for decoding compact to human-friendly
strings, but it would mostly be used for debugging and such, and it
wouldn't need to already be in place for apps to start using the new
hashes. And yes, potentially needing something like this for debugging
is a drawback of not using a human-friendly encoding everywhere.

Alexander
Anthony Ferrara
2015-09-28 19:03:24 UTC
Permalink
Alexander,
Post by Solar Designer
Post by Anthony Ferrara
So my overall suggestion is make it easier on the developer using the
API, rather than the one writing it. After all, there are going to be
FAR more people writing a salt string then there will be writing the
backend implementation. Optimize for the greater use-case...
We can have our crypt()-like API accept both compact and human-friendly
"setting strings", but output only compact encodings. This takes care
of making it easy for application developers to generate new hashes via
languages' existing crypt() API without waiting for a new API to be
introduced. It also prevents such application developers from
influencing the actual encoding (order of parameters seen in final
crypt("password", "$7X$logN=14,r=8")
crypt("password", "$7X$p=1,r=8,N=16384")
$7X$B5$Pwm/zQAIhEVTKlaoJSA7TQ$kBGj9fHznVYFQMEn/qDCfrDevf9YDtcDdKvEqHJLV8D
crypt("password", "$7X$B5$Pwm/zQAIhEVTKlaoJSA7TQ")
I guess my point here is more why does the compact encoding need to
exist? Why can't the encoding that was passed in be returned (assuming
it was valid in the first place)?

And is there really a need for a compact encoding in the first place?
I guess I always prefer explicitness over byte savings. Even in the
worst case situation where you're storing billions of passwords,
you're talking about saving perhaps 20 gigabytes out of a total of at
least 100gb. Yes, 20% is not insignificant, but even 20gb is a trivial
amount of data to basically any system (especially one with billions
of users).
Post by Solar Designer
With PHP's password_hash()/password_verify() API, the human-friendly
parsing would need to be enabled in password_hash(), but not
(necessarily) in password_verify(). Currently password_hash() accepts a
numeric $algo, but maybe we simply need to introduce PASSWORD_ANY (any
better name for it?) and have the actual choice made by a $setting
string that will follow.
So password_verify() will accept any arbitrary crypt() hash. That way
you pass in whatever, and it will confirm it for you.

password_hash() acts as a format-generator, a high-level API to
generate that setting string so you don't have to.

So I would forsee (with sane defaults for interactive usage):

$hash = password_hash($password, PASSWORD_ARGONI);

And if you wanted to specify parameters, they would go into the options array:

$hash = password_hash($password, PASSWORD_ARGONI, ["logN" => 15]);

The actual salt will be generated for you, as well as the string put
together. Whichever format it wants to generate doesn't matter to the
outside, that's all abstracted away.

The password_verify would accept any valid hash (so a migration path
to upgrade in-place could be implemented).

The one question I have is if we want to do a first-order
approximation of "cost" with an algorithm, so individual engineers
(the target of said API) don't need to learn about the individual
parameters and the tradeoffs. So we can say "use cost 10, but try 11
or 12 and see the runtime to see if it's OK for your server), and have
that cost value be derived into the appropriate settings for the
algorithm for that "cost". I don't know if this is a good idea or not,
just something I've been thinking about.

Remember, the target for password_* is normal web developers, not
security experts. They will not understand the tradeoffs that the
algorithms make (whether they should or not is irrelevant, they
won't). That's why password_* aims to be a high-level abstraction, not
just a re-invention of crypt (it serves a different consumer).
Post by Solar Designer
Then we could also have an API for decoding compact to human-friendly
strings, but it would mostly be used for debugging and such, and it
wouldn't need to already be in place for apps to start using the new
hashes. And yes, potentially needing something like this for debugging
is a drawback of not using a human-friendly encoding everywhere.
Yeah, that's my feeling. We're not talking about saving a huge amount
of storage space here. Even the largest websites in the world would be
saving on the scale of test of gigabytes. Small enough that it feels
like a micro-optimization to me. I'd rather the entire thing be human
readable.

But that's just my view. And I could very well be wrong here (or be
missing something).

Thanks

Anthony
Hugo Landau
2015-09-07 13:37:25 UTC
Permalink
I worked on this for scrypt, see > >
https://gitlab.com/jas/scrypt-unix-crypt/blob/master/unix-scrypt.txt > >
and I am interested in working on this for Argon2 too. > > I don't
believe it is important to include this in the official >
specification. It should be fine to keep it in a separate document, and
for a disjoint, or only partially overlapping group of people, to work
on that project. I do agree that a plethora of incompatible formats
is > a severe pain, but if a number of people now agree on a writeup and
starts to experiment, I believe we can get closure on something that >
should be "good enough" for others to accept. That said, I'm not >
opposed to including things in the official specification, if consensus
on details can be established. > > /Simon
I don't see that it's that important that things like the traditional
crypt() base64 encoding be used. Ultimately so long as a format can be
reliably disambiguated from other formats, is a reasonable length ASCII
string and doesn't contain newlines or ':' (for /etc/shadow
compatibility), I don't think there's an issue.

In ignorance of the above specification I have been using my own, which
looks like this:

$s2$N$r$p$salt$hash

salt and hash are RFC 4648 base64-encoded strings, and N, r and p are
ASCII decimal integers. The salt is properly decoded before use.
Passwords are UTF-8.

This seems like a simpler style of format.

As a format invented in 30 seconds, this could work for Argon2i:

$a2i$p$m$t$nonce$hash

This format assumes that K = "" and X = "" (or are contextually
provided), which will probably(?) be appropriate for most use cases.
Let tag length be 32 bytes. Let p, m and t be ASCII-encoded positive
decimal integers. Let nonce and hash use RFC 4648 base64 encoding. Let
the passphrase use UTF-8.

What are the issues with this, I wonder?
Solar Designer
2015-09-07 14:49:21 UTC
Permalink
Post by Hugo Landau
Post by Simon Josefsson
https://gitlab.com/jas/scrypt-unix-crypt/blob/master/unix-scrypt.txt
In ignorance of the above specification I have been using my own, which
$s2$N$r$p$salt$hash
salt and hash are RFC 4648 base64-encoded strings, and N, r and p are
ASCII decimal integers. The salt is properly decoded before use.
Passwords are UTF-8.
This seems like a simpler style of format.
$a2i$p$m$t$nonce$hash
Perhaps move p to after m and t.

Should there be a shortcut syntax where some parameter can be omitted
and a default used? e.g. have p and t default to the smallest supported
(which is generally what I would recommend for password hashing use
anyway). If so, it'd be preferable to use another delimiter between the
parameters.
Post by Hugo Landau
This format assumes that K = "" and X = "" (or are contextually
provided), which will probably(?) be appropriate for most use cases.
Let tag length be 32 bytes. Let p, m and t be ASCII-encoded positive
decimal integers. Let nonce and hash use RFC 4648 base64 encoding. Let
the passphrase use UTF-8.
What are the issues with this, I wonder?
No major issues. Something like this is within consideration. The use
of decimal and RFC 4648 base64 is scripting language friendly. The use
of decimal is human friendly (except for parameters that are naturally
binary, such as a set of flags).

If others go for an approach like this for PHC schemes now, I will
probably do the same for yescrypt for consistency.

Minor drawbacks are:

A few chars are wasted on padding, which is meaningless here (3 chars
with a 128-bit salt and a 256-bit hash, so not a big deal).

Similarly, larger numbers consume more chars in decimal than in crypt
base64 encoding, but that's also just a few chars more.

I'd expect many (possibly most) implementations not to insist on the
decimal numbers having proper syntax (digits only) and being in range
(except after decoding, where an overflow may have already occurred).
It's just too tempting to simply use atoi() or a scripting language's
equivalent, instead of proper use of strtoul() (with its somewhat weird
semantics) or checking the string against a regexp first (including a
length limit).

Alexander
Hugo Landau
2015-09-07 16:58:41 UTC
Permalink
Post by Solar Designer
Should there be a shortcut syntax where some parameter can be omitted
and a default used? e.g. have p and t default to the smallest
supported (which is generally what I would recommend for password
hashing use anyway). If so, it'd be preferable to use another
delimiter between the parameters.
I suspect not. It just doesn't seem worth it. The savings are tiny. In
this day and age of multi-terabyte hard disks, nobody, but nobody, is
complaining that they can't upgrade from their "md5(password)" disaster
because they can't afford the disk space. (The only real issue that
comes to mind is that some systems may have a hardcoded limit, perhaps
255 bytes; but that's miles away, and any such limit needs to be
reasoned about in terms of worst cases, not best cases.)
Post by Solar Designer
I'd expect many (possibly most) implementations not to insist on the
decimal numbers having proper syntax (digits only) and being in range
(except after decoding, where an overflow may have already occurred).
It's just too tempting to simply use atoi() or a scripting language's
equivalent, instead of proper use of strtoul() (with its somewhat weird
semantics) or checking the string against a regexp first (including a
length limit).
This is true. However, I would question the likelihood of an
implementation being presented with such crypt strings. If someone is in
a position to present arbitrary malicious crypt strings to an
implementation, you have bigger problems.

This seems like it is best rectified with a robust test suite which
offers both must-pass and must-fail test cases, with plenty of cases for
edge conditions on integer parsing.
Jean-Philippe Aumasson
2015-09-07 10:39:23 UTC
Permalink
Definitely needed. Any help welcome :-)

@Dmitry: were you planning to add this to the A2 specs?
Post by Thomas Pornin
Post by Hugo Landau
Now that a winner has been announced, I wondered if the PHC has any
interest in specifying a modular crypt format to supplement the final
specification for Argon2?
My opinion is that such a specification should really exist, and,
preferably, be included right into the "official specification" (maybe
as an appendix) and into the reference implementation(s) as well.
Lack of a definite, standard format indeed always leads to a plethora
of incompatible formats that cause severe headaches down the line
(e.g. when switching implementations but reusing an existing database
of hashed passwords).
If the Argon2 authors do not have time for that, I can contribute the
specification and code if needed (I have not written anything to that
effect yet for Argon2, but I did for Makwa, so I believe I can do that
job properly).
--Thomas Pornin
Dmitry Khovratovich
2015-09-07 12:47:58 UTC
Permalink
We are definitely interested in including all necessary formats to the spec
and the reference implementation, to ease the correct use of Argon2 and
make the incorrect use more difficult, in turn.

We are ready to implement the proposed encoding standard in the code, or
help to integrate, say, Thomas' code into the reference implementation.

There must be some simple and robust API coming with the format, for example
GenerateHash(NewPassword, UserData) -> (Digest, Salt, UserData)
VerifyHash(Password,UserData,Digest,Salt) -> true/false

Dmitry


On Mon, Sep 7, 2015 at 12:39 PM, Jean-Philippe Aumasson <
Post by Jean-Philippe Aumasson
Definitely needed. Any help welcome :-)
@Dmitry: were you planning to add this to the A2 specs?
Post by Thomas Pornin
Post by Hugo Landau
Now that a winner has been announced, I wondered if the PHC has any
interest in specifying a modular crypt format to supplement the final
specification for Argon2?
My opinion is that such a specification should really exist, and,
preferably, be included right into the "official specification" (maybe
as an appendix) and into the reference implementation(s) as well.
Lack of a definite, standard format indeed always leads to a plethora
of incompatible formats that cause severe headaches down the line
(e.g. when switching implementations but reusing an existing database
of hashed passwords).
If the Argon2 authors do not have time for that, I can contribute the
specification and code if needed (I have not written anything to that
effect yet for Argon2, but I did for Makwa, so I believe I can do that
job properly).
--Thomas Pornin
--
Best regards,
Dmitry Khovratovich
Jean-Philippe Aumasson
2015-09-11 13:40:13 UTC
Permalink
@Thomas: would you like to start drafting a specs for the hashes encoding?
Post by Thomas Pornin
Post by Hugo Landau
Now that a winner has been announced, I wondered if the PHC has any
interest in specifying a modular crypt format to supplement the final
specification for Argon2?
My opinion is that such a specification should really exist, and,
preferably, be included right into the "official specification" (maybe
as an appendix) and into the reference implementation(s) as well.
Lack of a definite, standard format indeed always leads to a plethora
of incompatible formats that cause severe headaches down the line
(e.g. when switching implementations but reusing an existing database
of hashed passwords).
If the Argon2 authors do not have time for that, I can contribute the
specification and code if needed (I have not written anything to that
effect yet for Argon2, but I did for Makwa, so I believe I can do that
job properly).
--Thomas Pornin
Thomas Pornin
2015-09-11 14:05:15 UTC
Permalink
Post by Jean-Philippe Aumasson
@Thomas: would you like to start drafting a specs for the hashes encoding?
Yes, I will do that this week-end.


--Thomas
Loading...