[PHC] Finalizing Catena, Lyra2, Makwa, yescrypt

Discussion:

Jean-Philippe Aumasson

2015-07-28 07:26:32 UTC

Catena, Lyra2, Makwa, yescrypt are PHC's special recognitions. We'll
deliver packages (specs+code) of their final versions in Q3. Meanwhile,
minor changes can be made to the algorithms, but we need to agree on those.

@Submitters of C, L, M, y: Do you intend to make any changes to your
algorithms? If so, which ones?

@Others/panel: What changes would you like to see in the finalists? Why?

Thanks!

Jakob Wenzel

2015-07-28 13:19:19 UTC

Permalink

Post by Jean-Philippe Aumasson
Catena, Lyra2, Makwa, yescrypt are PHC's special recognitions.
We'll deliver packages (specs+code) of their final versions in Q3.
Meanwhile, minor changes can be made to the algorithms, but we need
to agree on those.
@Submitters of C, L, M, y: Do you intend to make any changes to
your algorithms? If so, which ones?

Hi,

we do not plan to make any changes to the algorithm of Catena.
Nevertheless, we plan to update the reference implementation and the
specification paper regarding to the descriptions of Blake2b-1,
mention Catena-Axungia (search tool to find optimal parameters),
mention Catena-Variants (implementation of Catena that allows
extension and mofidication), and correcting some indices to ensure
consistency throughout the whole document.

Best regards,
Jakob

--
Jakob Wenzel
Research Assistant
Chair of Media Security (Prof. Lucks)
Bauhausstraße 11 (Room 217)
99423 Weimar

Bill Cox

2015-07-28 13:24:11 UTC

Permalink

I would love to see a variant of Catena refactored as a framework that can
work with other memory-hard hashing algorithms. For example, Catena-Argon2
would provide several features missing in Argon2, and Catena-Yescrypt would
also provide some new functionality. Features in Yescrypt and Argon2 that
are already in Catena could be removed to simplify them, and some features
that we need, such as encode/decode parameters to a string could be added
in one place.

If Catena were refactored as a generic framework, I think I would use it in
most applications.

Bill

Bill Cox

2015-07-28 13:29:41 UTC

Permalink

Another cute Catena framework feature might be building hybrid memory
hashing algorithms by combining password-independent and password-dependent
algorithms. For example, hash using Catena or Argon2i for 1/4 of the time,
followed by Argon2d or Yescrypt for 3/4 of the time. This would retain
most of the memory*time off-line defense, while providing a significant
improvement against side-channel attacks over Argon2d or Yescrypt. It's
not as good as a hybrid algorithm, but it's not bad.

Bill

Thomas Pornin

2015-07-28 14:38:12 UTC

Permalink

Post by Jean-Philippe Aumasson
Catena, Lyra2, Makwa, yescrypt are PHC's special recognitions. We'll
deliver packages (specs+code) of their final versions in Q3. Meanwhile,
minor changes can be made to the algorithms, but we need to agree on those.
@Submitters of C, L, M, y: Do you intend to make any changes to your
algorithms? If so, which ones?

I have no planned change on Makwa, neither on the algorithm nor the
package. In my opinion, the specification and the reference
implementations are "production ready" as they are today.

(But if there is popular demand for some feature that actually makes
sense, I am open to discussion, of course.)

--Thomas Pornin

Solar Designer

2015-07-30 05:00:36 UTC

Permalink

Post by Jean-Philippe Aumasson
Catena, Lyra2, Makwa, yescrypt are PHC's special recognitions. We'll
deliver packages (specs+code) of their final versions in Q3. Meanwhile,
minor changes can be made to the algorithms, but we need to agree on those.
@Submitters of C, L, M, y: Do you intend to make any changes to your
algorithms? If so, which ones?

I am considering these tweaks to yescrypt:

1. A trivial change to how t is updated on hash upgrades. (In v1, t is
reset to 0 on the first hash upgrade, to maximize the area-time product
growth on upgrades - but I've since realized that in some cases, such as
when t was very high initially, this strategy is not convenient.)

2. Reverse order of sub-blocks to mitigate Argon team's depth*memory
tradeoff attack (which might be practical at low yet non-negligible
memory reduction factors like 1/2 or 1/3). I would then appreciate
review of the revised algorithm by the Argon and Lyra2 teams, if they
are willing to.

3. Re-introduction of TMTO-resistant pwxform S-box initialization
(dropped between v0 and v1 for simplicity of a potential yescrypt-lite,
while not simplifying the full yescrypt) or/and introduction of S-box
overwrites in BlockMix_pwxform.

4. A lower Salsa20 rounds count (likely 2 instead of 8) for its use in
BlockMix_pwxform.

While I am considering these, I will not necessarily make all 4.

The common theme in potential tweaks 3 and 4 (and maybe 2) is that since
yescrypt wasn't selected as the winner anyway, I don't have to focus on
allowing for a much simpler yescrypt-lite subset (hopefully, Argon2 with
MAXFORM will fill that space... although without MAXFORM, it won't).
It would still be possible to have a yescrypt-lite; it would just be
slightly more complex than what's possible with the current yescrypt v1.

We'll also need to agree on currently recommended sets of pwxform
parameters (up to 3 sets), or on one such set.

There should also be improvements to specs and code beyond what I listed
above, but those would not be changes to the algorithm. Just clearer
specification and more robust code with improved APIs.

Alexander

Solar Designer

2015-10-04 11:47:37 UTC

Permalink

[...]

Post by Solar Designer
2. Reverse order of sub-blocks to mitigate Argon team's depth*memory
tradeoff attack (which might be practical at low yet non-negligible
memory reduction factors like 1/2 or 1/3). I would then appreciate
review of the revised algorithm by the Argon and Lyra2 teams, if they
are willing to.
3. Re-introduction of TMTO-resistant pwxform S-box initialization
(dropped between v0 and v1 for simplicity of a potential yescrypt-lite,
while not simplifying the full yescrypt) or/and introduction of S-box
overwrites in BlockMix_pwxform.

Out of the above, after having experimented with various (unpublished)
revisions, I am now leaning towards "introduction of S-box overwrites in
BlockMix_pwxform" only. It may eliminate the need for the rest of the
above. Here's why:

"Reverse order of sub-blocks" sounds easy, but it appears to achieve
quite little (in an attack, storage of just a few sub-blocks per
not-fully-stored block appears sufficient to undo most of the added
recomputation latency) and it results in Integerify() having to be
defined differently than scrypt's (use first vs. last sub-block), so
this complicates the spec (if classic scrypt support is also included).
On top of this, my testing shows slight performance impact from the
reversed order of sub-block writes. Going for scrypt's sub-block
shuffling (which would have allowed to keep Integerify() common) appears
to achieve about as little, but results in a couple of special cases to
specify for when pwxform block size isn't same as Salsa20 block size.
(They are the same with yescrypt's current pwxform defaults, but making
this the only case would lose much of yescrypt's low-level scalability.)

S-box overwrites in BlockMix_pwxform may defeat the same kind of attacks
that sub-block shuffling would try to mitigate. The overwrites would
not directly deal with those attacks - the "pipelining" would still work
just as well - however, having to store or/and recompute (portions of)
the S-boxes may introduce enough extra storage needs or/and extra
latency that the attacks would not be worthwhile overall. Per Dmitry's
analysis, for yescrypt with r=8 (as recommended) only 1/2 and 1/3 memory
reductions were a concern (with 1/4 having a computation penalty of over
1000 times), so a 3x increase in storage and latency product for
mounting those attacks appears sufficient to make them ineffective.

S-box overwrites in BlockMix_pwxform may also eliminate the (potential)
need for "TMTO-resistant pwxform S-box initialization". TMTO resistance
at that level (where an attack would have allowed a GPU-like device with
otherwise too little local memory per instance, like 4 KB instead of
8 KB, to operate reasonably efficiently) may also be achieved through
having the S-box contents gain extra data dependencies early on as the
main SMix1 runs, rather than during S-box initialization.

Also, I realized that current pwxform unnecessarily under-uses the
write port(s) to L1 cache. On typical CPUs, there are two read ports
and one write port to L1 cache, whereas current pwxform does almost
exclusively reads (with the only writes being to the current output
block, as well as implicitly through function calls and occasional
register spilling). I was concerned about write-through L1 caches found
in some CPUs. However, testing on Bulldozer (which has write-through L1
along with a smaller write coalescing cache) shows that sequential
writes are OK (and that random writes are not OK). It does behave
slower than Intel CPUs with write-back L1, but not unacceptably so
(except with random or otherwise non-sequential writes, which I am not
going to use). I was also concerned about that cached data being
written out to memory once in a while, thereby wasting memory bandwidth.
However, even if this is happening, I am not seeing it result in much of
a yescrypt speed reduction (testing on Bulldozer, Sandy Bridge,
Haswell), so probably it isn't happening a lot and/or not at times when
yescrypt demands that bandwidth. It is a typical enough case (working
on some local and some global data simultaneously) that CPUs optimize
it. And yes, this also means that Argon2's 1 KB of temporary state is
probably OK (albeit wasteful since it didn't have to be just temporary,
but could affect next block's computation).

Here's what I am able to fit within +/-10% of yescrypt 0.7.1's
performance (mostly as tested for max threads at 2 MB per independent
instance, but also tested with other memory sizes and single thread):

Option 1: reduce PWXrounds from 6 to 5, but introduce a third S-box
(thus, e.g. 8 KB to 12 KB total) and rapid S-box writes inbetween rounds
(writing to the third S-box, which is inactive for reads at that time,
and then rotating the 3 S-boxes at each sub-block). This means that 12
gather loads (6 rounds times 2 gather loads) are replaced with 10 gather
loads and 4 sequential stores. (Counted in sub-block sized operations.)
The total L1 bandwidth usage increases by a factor of 7/6, and the GPU
local memory attack resistance by a factor of 3/2*5/6 = 5/4 (even
assuming the stores are somehow free, which they are not). Drawbacks
are that multiplication latency hardening decreases to 5/6, and that
GPU(-like) global memory attacks (if the S-boxes would be put in global
memory anyway) might become up to 6/5 times faster (again, assuming the
sequential and perfectly coalesced stores are somehow free).

A question is whether such rapid writes are in fact needed. We make
great use of the L1 cache write ports, but in terms of defeating
sub-block and S-box TMTO attacks we probably don't need the writes to be
this rapid. With the example settings above, we write 4x more data to
the S-boxes than to the output block (so we use 5x the current write
bandwidth overall). In other words, with 1 KB blocks (r=8) we fully
replace the 12 KB S-box contents per every 3 blocks written. To defeat
the sub-block TMTO attack it would probably be enough to write as much
data into the S-boxes as we write into the output block (although it
matters what data we write: how quick it is to recompute given what
other data, or vice versa), and to defeat TMTO attacks on the S-boxes
it'd probably be enough to overwrite and entangle the S-boxes just once
(yet do it early on).

Anyway, with this combination of settings performance on Haswell is same
or slightly better as before (e.g. 5% better for the 2 MB test), and
performance on Bulldozer is same or slightly worse than before (e.g. 8%
worse for the 2 MB test). This is without using AVX2. (These changes
should make AVX2 more of a help, which is both good and bad.)

I haven't tested that yet, but I think this will also work well for
using L2 cache with AVX2-optimized pwxform settings and implementation
on Haswell, where previously 64 KB worked well (would be 96 KB now).

Valid PWXrounds settings for this design are "a power of 2, plus 1", so
e.g. 2, 3, 5. This is to avoid having to check/mask the write offset
against the limit too frequently (can't hit the limit in most places),
and to skip the S-box write after the last round (which is followed by a
write to the block).

Option 2: since the above means introduction of "pwxform context"
anyway, it now becomes straightforward to add similar handling of a
larger S-box, targeting L2 cache. Reduce PWXrounds from 6 to 4,
introduce a third small S-box (thus, e.g. 8 KB to 12 KB total) like
above, and introduce a 128 KB S-box also including the smaller 3
S-boxes as its part. Have the rest of the larger S-box (beyond the
12 KB holding the smaller S-boxes) lazy-initialized during the main SMix
operation (keep the current bitmask in the pwxform context). Unlike the
above, perform only two S-box writes per sub-block: one write to the
currently inactive small S-box (which gets activated at next sub-block,
like above), and one write to the remainder of the large S-box. (This
means maintaining a sequential write offset for the larger S-box, and
masking it for writes to the smaller S-box.) These are the extra
actions in two of the four rounds. In two other rounds, the extra
actions are cache line sized random lookups against the large S-box.
Alternatively, random lookups against the large S-box are performed in
all four rounds, which Haswell manages but the other two CPUs not so
well. This looks something like:

#define PWXFORM_ROUND_READ \
r = EXTRACT64(X0) & m; \
PREFETCH(S + r, _MM_HINT_T0) \
PWXFORM_SIMD(X0, x0, s00, s01) \
PWXFORM_SIMD(X1, x1, s10, s11) \
PWXFORM_SIMD(X2, x2, s20, s21) \
X0 = _mm_add_epi64(X0, *(__m128i *)(S + r)); \
X1 = _mm_add_epi64(X1, *(__m128i *)(S + r + 16)); \
X2 = _mm_add_epi64(X2, *(__m128i *)(S + r + 32)); \
PWXFORM_SIMD(X3, x3, s30, s31) \
X3 = _mm_add_epi64(X3, *(__m128i *)(S + r + 48));

#define PWXFORM_ROUND_RW(Sw, w) \
r = EXTRACT64(X0) & m; \
PREFETCH(S + r, _MM_HINT_T0) \
PWXFORM_SIMD(X0, x0, s00, s01) \
PWXFORM_SIMD(X1, x1, s10, s11) \
PWXFORM_SIMD(X2, x2, s20, s21) \
X0 = _mm_add_epi64(X0, *(__m128i *)(S + r)); \
X1 = _mm_add_epi64(X1, *(__m128i *)(S + r + 16)); \
PWXFORM_SIMD(X3, x3, s30, s31) \
X2 = _mm_add_epi64(X2, *(__m128i *)(S + r + 32)); \
X3 = _mm_add_epi64(X3, *(__m128i *)(S + r + 48)); \
*(__m128i *)(Sw + (w)) = X0; \
*(__m128i *)(Sw + (w) + 16) = X1; \
*(__m128i *)(Sw + (w) + 32) = X2; \
*(__m128i *)(Sw + (w) + 48) = X3;

#define PWXFORM_START \
{ \
PWXFORM_X_T x0, x1, x2, x3; \
__m128i s00, s01, s10, s11, s20, s21, s30, s31; \
uint32_t r; \
PWXFORM_ROUND_RW(S, w) \
PWXFORM_ROUND_READ \
PWXFORM_ROUND_RW(S2, w & Smask) \
w += 64; \
PWXFORM_ROUND_READ \
{ \
uint8_t * Stmp = S2; \
S2 = S1; \
S1 = S0; \
S0 = Stmp; \
} \
}

#define PWXFORM_END \
{ \
if (__builtin_expect(!(w & (w - 1)), 0)) { \
m |= w - PWXbytes; \
if (w == Sbytes) w = Sinit; \
} \
}

#define PWXFORM \
PWXFORM_START PWXFORM_END

(PWXFORM_END may be skipped in places where it is known the condition
can't be true, such as on odd iterations of an unrolled loop.)

This requires attackers to either provide 128 KB of (semi-)local memory
per instance, or transfer that data off-chip. The latter approach would
increase the attack chip's external bandwidth needs by a factor of 2.5.
(We had 1 read and 1 write per block, so 2 total. Now we also have 2
reads and 1 write to the large S-box, so 5 total. In fact, in the
Haswell-enabled variation pasted above, it's 4 rather than 2 reads from
the large S-box, so 7 total, and a 3.5x increase.)

The choice of 128 KB is since it's how much L2 cache we have per thread
on many modern CPUs. Bulldozer could do more (1 MB L2 per "core"),
Knights Corner/Landing would probably need less (256 KB per 4 threads,
so 64 KB). 128 KB works great on Haswell, but is a few percent slower
than 64 KB on Sandy Bridge; I don't know exactly why (maybe something to
do with when L1 lines also have to be present in L2, although neither
CPU has L2 forcibly inclusive of L1, as far as I could find).

Haswell runs this so great that we could keep PWXrounds = 5 rather than
go for 4. Older CPUs sort of need 4.

Putting L2 cache to use while also doing smaller lookups from L1 is
great, but this has some drawbacks: a further reduction of PWXrounds
(where the move from 8 KB to 12 KB barely compensates for it in terms of
anti-GPU, and the reduction in multiplication latency hardening is only
maybe compensated for by L2 cache latency hardening), PWXrounds becomes
inflexible (it sort of has to be 4, or maybe 5 if we re-add one simple
pwxform round after these 4 rounds with extras), and overall this looks
specialized rather than elegant (even though it is still quite flexible
in terms of being able to tune the widths and sizes to fit a wide
variety of future devices).

So what do we prefer - the more elegant option 1 (with crazy frequent
S-box writes, because we can?) or the more extensive option 2?

As a more conservative alternative to either of these, it may also be
reasonable to keep PWXrounds at 6 while introducing the third small
S-box, with not-so-frequent writes, but this appears to run slightly
slower than the version with 5 rounds and the crazy writes. Given that
we can have the crazy writes for less performance cost than 1 round on
current CPUs, perhaps we should? Also, "one write per round" might be
more elegant than "one write after PWXrounds/2 rounds".

Introduction of the third S-box, besides being good for anti-GPU on its
own (1.5x larger total almost for free, except on Bulldozer-alikes), is
closely related to enabling the writes. Without it, the writes would
have to be to an S-box currently active for reads as well, which reduces
(defensive) implementation flexibility (in particular, it wouldn't let
register-starved architectures compute fewer lanes at once, only needing
to sync once per sub-block; allowing this is a deliberate design goal).

Sorry for the long message. I'd appreciate any comments, to any part of
the message (as I understand most of us wouldn't have time to comment on
the whole thing at once).

Alexander

Solar Designer

2015-10-06 11:45:00 UTC

Permalink

Post by Solar Designer
S-box overwrites in BlockMix_pwxform may defeat the same kind of attacks
that sub-block shuffling would try to mitigate. The overwrites would
not directly deal with those attacks - the "pipelining" would still work
just as well

I mean it would still work just as well when no shuffling is done.
When shuffling is done as well, the S-box overwrites do also work
against "pipelining". It's just that this combination of both
mitigations appears excessive, and the S-box overwrites appear to be
more effective on their own than shuffling does on its own.

Post by Solar Designer
- however, having to store or/and recompute (portions of)
the S-boxes may introduce enough extra storage needs or/and extra
latency that the attacks would not be worthwhile overall.
Here's what I am able to fit within +/-10% of yescrypt 0.7.1's
performance (mostly as tested for max threads at 2 MB per independent
Option 1: reduce PWXrounds from 6 to 5, but introduce a third S-box
(thus, e.g. 8 KB to 12 KB total) and rapid S-box writes inbetween rounds
(writing to the third S-box, which is inactive for reads at that time,
and then rotating the 3 S-boxes at each sub-block).

I've since managed to retain PWXrounds = 6 yet also add the third S-box
and the 4 stores. Due to some other optimizations, this achieves about
the same performance as yescrypt 0.7.1 on Sandy Bridge and Haswell.
Unfortunately, not on Bulldozer: 4% (1 GB KDF) to 13% (2 MB password
hashing) performance impact there (most of it from going from 8 KB to
12 KB, and not from the stores). Yet I think this is acceptable.

Post by Solar Designer
This means that 12
gather loads (6 rounds times 2 gather loads) are replaced with 10 gather
loads and 4 sequential stores. (Counted in sub-block sized operations.)

Now it's 12 gather loads alone, vs. 12 gather loads and 4 sequential
stores (the stores are almost free in this code on these 3 CPUs).

Post by Solar Designer
The total L1 bandwidth usage increases by a factor of 7/6, and the GPU

Now it's 4/3.

Post by Solar Designer
local memory attack resistance by a factor of 3/2*5/6 = 5/4 (even
assuming the stores are somehow free, which they are not).

Now it's 3/2, assuming same speed as before (so not on Bulldozer, where
this improvement is smaller) and ignoring possible additional effect of
the stores.

Post by Solar Designer
Drawbacks
are that multiplication latency hardening decreases to 5/6, and that
GPU(-like) global memory attacks (if the S-boxes would be put in global
memory anyway) might become up to 6/5 times faster (again, assuming the
sequential and perfectly coalesced stores are somehow free).

These drawbacks no longer apply, except that the slight performance loss
on Bulldozer might help some attackers (those whose hardware would not
be impacted to at least the same extent - e.g., Intel CPUs).

Post by Solar Designer
Valid PWXrounds settings for this design are "a power of 2, plus 1", so
e.g. 2, 3, 5.

It's "a power of 2, plus 2" now, so 3, 4, 6, 10.

Post by Solar Designer
Option 2: since the above means introduction of "pwxform context"
anyway, it now becomes straightforward to add similar handling of a
larger S-box, targeting L2 cache. Reduce PWXrounds from 6 to 4,
introduce a third small S-box (thus, e.g. 8 KB to 12 KB total) like
above, and introduce a 128 KB S-box also including the smaller 3
S-boxes as its part. Have the rest of the larger S-box (beyond the
12 KB holding the smaller S-boxes) lazy-initialized during the main SMix
operation (keep the current bitmask in the pwxform context). Unlike the
above, perform only two S-box writes per sub-block: one write to the
currently inactive small S-box (which gets activated at next sub-block,
like above), and one write to the remainder of the large S-box. (This
means maintaining a sequential write offset for the larger S-box, and
masking it for writes to the smaller S-box.) These are the extra
actions in two of the four rounds. In two other rounds, the extra
actions are cache line sized random lookups against the large S-box.
Alternatively, random lookups against the large S-box are performed in
all four rounds

I've since managed to introduce multiply-adds into the L2 cache reading
rounds as well, so the total multiplication latency hardening remains at
6 (or even becomes 8) per sub-block while achieving similar performance
as before.

The remaining major drawback is that the random lookups become less
frequent - there's still the rounds count reduction from 6 to 4 in that
respect. For GPU attacks that placed the S-boxes in global memory
anyway, this reduction in frequency of random lookups would result in a
speed advantage. With 6 rounds, 2 active S-boxes, and 4 gather lanes,
we had 48 random lookups per sub-block. With 4 rounds, we have just
32+2=34 or 32+4=36. (Counting the L2 lookups the same as L1 ones here,
as they would be similar if all are done from GPU global memory anyway.)

This may well be the primary reason why I probably won't go for
introducing an L2 layer into yescrypt now. Other reasons include the
complexity increase, and the design starting to look specialized
rather than generic.

That's a pity, as having an L2 layer in the arsenal of defenses could
help against some attack hardware, including specific GPUs that would
have preferred to use local memory otherwise.

Post by Solar Designer
This requires attackers to either provide 128 KB of (semi-)local memory
per instance, or transfer that data off-chip. The latter approach would
increase the attack chip's external bandwidth needs by a factor of 2.5.
(We had 1 read and 1 write per block, so 2 total. Now we also have 2
reads and 1 write to the large S-box, so 5 total. In fact, in the
Haswell-enabled variation pasted above, it's 4 rather than 2 reads from
the large S-box, so 7 total, and a 3.5x increase.)
The choice of 128 KB is since it's how much L2 cache we have per thread
on many modern CPUs. Bulldozer could do more (1 MB L2 per "core"),

I tested that Bulldozer works reasonably well for up to 512 KB used as
yescrypt's L2 per thread. There is performance decline relative to
smaller sizes, but it's not sharp (until the 512 KB to 1 MB increase).

Post by Solar Designer
Knights Corner/Landing would probably need less (256 KB per 4 threads,
so 64 KB).

I wrongly listed Knights Landing here - apparently, it will have 1 MB of
L2 cache per 2 cores, so 128 KB per thread.

Post by Solar Designer
128 KB works great on Haswell, but is a few percent slower
than 64 KB on Sandy Bridge; I don't know exactly why (maybe something to
do with when L1 lines also have to be present in L2, although neither
CPU has L2 forcibly inclusive of L1, as far as I could find).

Also, 128 KB results in lower and less stable speeds than 64 KB when
using a large ROM (multi-GB on huge pages). Maybe the page table
entries significantly benefit from being in L2 rather than in L3, or
something.

Post by Solar Designer
So what do we prefer - the more elegant option 1 (with crazy frequent
S-box writes, because we can?) or the more extensive option 2?

I am leaning towards staying with a variation of option 1 at this time.

Alexander

Marcos Antonio Simplicio Junior

2015-07-30 14:47:47 UTC

Permalink

Hi, JP.

We have no changes planned for Lyra2, unless there is some specific request.

We are currently working on a hardware implementation of BlaMka, so it may change if we perceive some clear software-oriented optimization. However, since the underlying sponge is independent from Lyra2's design, I believe that does not qualify as a "change to Lyra2" :-)

BR,

Marcos.
----- Mensagem original -----

Enviadas: TerÃ§a-feira, 28 de Julho de 2015 4:26:32
Assunto: [PHC] Finalizing Catena, Lyra2, Makwa, yescrypt
Catena, Lyra2, Makwa, yescrypt are PHC's special recognitions. We'll
deliver packages (specs+code) of their final versions in Q3.
Meanwhile, minor changes can be made to the algorithms, but we need
to agree on those.
@Submitters of C, L, M, y: Do you intend to make any changes to your
algorithms? If so, which ones?
@Others/panel: What changes would you like to see in the finalists?
Why?
Thanks!