SHA1 [C++]

Prev: C++ Concepts
Next: Portable way to write binary data

From: gkoreman on 20 Nov 2005 12:06

I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
The pseudo-code is on:
http://en.wikipedia.org/wiki/SHA-1

My code is working, but is not giving me the correct hashes.
This is being tested (initially) on a big-endian machine, and only
after I have it working on big-endian was I planning on making it work
on both big and little endian.

#include <stdio.h>
#include <vector>
#include <string>
#include <cmath>
using namespace std;

#define ROTL(n,X) ( ( ( X ) << n ) | ( ( X ) >> ( 32 - n ) ) )

typedef unsigned int word32;

class SHA1Hash
{
private:

void Process(std::vector<word32>& message)
{
// Pre-processing:
// 1. append a single "1" bit to message
// 2. append "0" bits until message length = 490 = -32
(mod 512)
// 3. append length of message (before pre-processing),
in bits as 32-bit big-endian integer to message

word32 messageLength = message.size() * 32;

// Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
message.push_back(padding);

// Step 2
padding = 0;
while( (message.size() % 16) != 15 )
message.push_back(padding);

// Step 3
message.push_back(messageLength);

//
// Actual Hash Code
//

unsigned int chunkIndex = 0;
//Initialize variables:

h0 = 0x67452301;
h1 = 0xEFCDAB89;
h2 = 0x98BADCFE;
h3 = 0x10325476;
h4 = 0xC3D2E1F0;

// break message into 512-bit chunks
while( chunkIndex < message.size() )
{
// break chunk into sixteen 32-bit words w(i),
0 = i = 15
vector<word32> w(80, 0);
for( unsigned int i = 0; i < 16; i++ )
w[i] = message[chunkIndex + i];

for( unsigned int i = 16; i < 80; i++ )
w[i] = ROTL( 1, (w[i-3] ^ w[i-8] ^
w[i-14] ^ w[i-16]));

word32 a = h0;
word32 b = h1;
word32 c = h2;
word32 d = h3;
word32 e = h4;

for( unsigned int i = 0; i < 80; i++ )
{
word32 f = 0;
word32 k = 0;
if( i < 20 )
{
f = (b & c) | ((~b) & d);
k = 0x5A827999;
}
else if( i < 40 )
{
f = b ^ c ^ d;
k = 0x6ED9EBA1;
}
else if( i < 60 )
{
f = (b & c) | (b & d) | (c &
d);
k = 0x8F1BBCDC;
}
else if( i < 80 )
{
f = b ^ c ^ d;
k = 0xCA62C1D6;
}

word32 temp = ROTL(5, a) + f + e + k +
w[i];
e = d;
d = c;
c = ROTL(30, b);
b = a;
a = temp;
}

// Add this chunk's hash to result so far:
h0 = h0 + a;
h1 = h1 + b;
h2 = h2 + c;
h3 = h3 + d;
h4 = h4 + e;

chunkIndex+=16;
}
}

... a public function that calls process

I have tried it with an empty vector and get
5bc0ce0 b2008157 3de49bed 97b3e936 af3f86ce

This is what I should get:
da39a3ee5e6b4b0d3255bfef95601890afd80709

I have looked through this for hours now and can't see anything wrong
with it. Can anyone give me a hand?

btw, this is NOT homework, this is a personal project.

Thanks

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Carlos Moreno on 21 Nov 2005 04:55

gkoreman(a)gmail.com wrote:

> I have tried it with an empty vector and get
> 5bc0ce0b20081573de49bed97b3e936af3f86ce
>
> This is what I should get:
> da39a3ee5e6b4b0d3255bfef95601890afd80709
>
> I have looked through this for hours now and can't see anything wrong
> with it. Can anyone give me a hand?

Two things -- your string is shorter than the expected/required
160 bits. I removed the spaces from your output, and the end is
not aligned with the end of what you should get (I'm looking at
it with non-proportional spacing font). Most likely, this is
because of one of your cout statements that is supposed to output
0X (where X is whatever digit), and it's choosing to output X
alone, omiting the leading 0.

Also, pay attention to endianness -- perhaps after adding the
presumably missing 0, you may notice a pattern of reversed bytes;
the SHA1 description talks about concatenated and expressed as
big-endian -- if you ran this on an x86 processor and did not
pay attention to endianness, you might end up getting the wrong
result.

HTH,

Carlos
--

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Le Chaud Lapin on 21 Nov 2005 04:51

gkoreman(a)gmail.com wrote:
> I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
> // Pre-processing:
> // 1. append a single "1" bit to message
> // 2. append "0" bits until message length = 490 = -32
> (mod 512)
> // 3. append length of message (before pre-processing),
> in bits as 32-bit big-endian integer to message

You are not allowed to change the size of the 64-bit appendage or any
other part of the alogrithm. The specification calls for padding until
the length is congruent to 448 mod 512.
It looks as if this is the source of your incorrect digest.

>From Specification:
"append a single "1" bit to message
append "0" bits until message length = 448 = -64 (mod 512)
append length of message (before pre-processing), in bits as 64-bit
big-endian integer to message"

> word32 messageLength = message.size() * 32;
>
> // Step 1
> word32 padding = 0x8000000; // 1000 0000 0000 0000
> message.push_back(padding);

Are you sure you are on a big-endian machine? I tested your code and it
seems that youre machine is little endian. In any case, again, you
have to be fastidious in watching the spec in this section. In
general, you cannot slap on an entire 64-bit chunk onto the message.
For example, if the message is 440 bits long, you're only allowed to
add 1 byte for padding: the bit-pattern of a 1 followed by 7 zeros.
This will bring you to congruency mod 448, at which point you can tack
on the 64-bit length to get to a full 512 bits.

> // Step 3
> message.push_back(messageLength);

You punter. :)

Technically, you have to be prepared for crunching a digest whose size
is greater than (2^32). Here you're only taking care of lengths up to
2^32.

NOTES:

1. You want to be able to minimize required changes to code if you
suddendly decide to switch to a different hash algorithm, say SHA-256.
2. You want the output of the hashing to be an object itself so that
you can let it construct, assign it, compare it, store it in
containers, toss it around, etc., let it destruct.
3. Loading a sequence of byes into a vector<> just to hash might be
too slow.
4. You want the hash algorithm itself to be fast.

SUGGESTIONS:

I would define the algorithm itself as a namespace. Call it SHA_1:

namespace SHA_1;

Then you can define other hashing algorithms in their own namespaces:
namespace MD5;
namespace Adler32;

Within your SHA_1 namespace, define a Digest and a Hasher. A Digest
will contain your 160-bit output. A Digest should also be able to
yield itself as a std::string (for printing). The Hasher should
maintain a 64-byte state so that it can be used incrementally to hash
disjoint data buffers, maintaining remnants mod 64 bytes.

Overload two hashing functions in Hasher - one function for incremental
hashing, another that does the same thing as the first but also takes
by reference a Digest to collect the final digest and signal to the
Hasher that it should reset itself to prepare for the next stream to be
hashed:

namespace SHA-1
{
class Digest { unsigned char buffer[20];
operator string () const; // ...
bool operator == (const Digest &that) const;
bool operator != (const Digest &that) const {return !(*this ==
that);}

bool operator > (const Digest &that) const;
bool operator < (const Digest &that) const {return (that >
*this);}

bool operator >= (const Digest &that) const {return (*this >
that) || (*this == that);}
bool operator <= (const Digest &that) const {return (*this <
that) || (*this
} ;

class Hasher
{
unsigned char block[64]; // 512 bits crunched at a time.

unsigned long int lower_word_bit_count : 32;
unsigned long int upper_word_bit_count : 32;

unsigned long int h0 : 32;
unsigned long int h1 : 32;
unsigned long int h2 : 32;
unsigned long int h3 : 32;
unsigned long int h4 : 32;
};
Hasher &hash (const void *buffer, unsigned long int length);
Hasher &hash (const void *buffer, unsigned long int length,
Digest &digest);

}

The first hashing function should hash as many 512-bit chunks of the
incremental data streams as it can, retaining remnants of the
penultimate stream (mod 64-bytes) in its block, until the Digest is
asked for by the second hash function, at which point it can perform
one final hash of the ultimate stream, padding as necessary, tacking on
the 64-bit length, and crunching one final time the 64-byte block that
includes the 64-bit length.

Ron Rivest's implementation of MD5 hints at some of these principles,
but they are not readily apparent because the code is written in C.

See source code at end of: http://www.ietf.org/rfc/rfc1321.txt

-Le Chaud Lapin-

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Thomas J. Gritzan on 21 Nov 2005 04:54

gkoreman(a)gmail.com schrieb:
> I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
> The pseudo-code is on:
> http://en.wikipedia.org/wiki/SHA-1
>
> My code is working, but is not giving me the correct hashes.
> This is being tested (initially) on a big-endian machine, and only
> after I have it working on big-endian was I planning on making it work
> on both big and little endian.
[...]
> // Step 1
> word32 padding = 0x8000000; // 1000 0000 0000 0000
> message.push_back(padding);
[...]

You forget a zero. Should be:

word32 padding = 0x80000000; // or: 1 << 31;

Works for me on a little-endian machine (only for zero-length messages).

Thomas

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: James Kanze on 21 Nov 2005 05:33

gkoreman(a)gmail.com wrote:

> I am trying to implement SHA1 based on the pseudo-code on
> Wikipedia. The pseudo-code is on:
> http://en.wikipedia.org/wiki/SHA-1

I doubt that this is the problem here, but I'd generally prefer
the reference site for such things. The Wiki article you site
gives a link, and the reference site
(http://www.ietf.org/rfc/rfc3174.txt) contains a reference
implementation and a lot more explications concerning the
implementation than the Wiki article.

> My code is working, but is not giving me the correct hashes.

So which is it? Is it working, or isn't it?

> This is being tested (initially) on a big-endian machine, and
> only after I have it working on big-endian was I planning on
> making it work on both big and little endian.

Shouldn't make a difference. (I've no experience with SHA-1,
but my implementation of MD5 works equaly well on both big and
little endian machines, without any conditionals or whatever.)

Since I haven't actually written an SHA-1 myself, I'll probably
miss a number of important points, but several things strike me
immediately.

> #include <stdio.h>
> #include <vector>
> #include <string>
> #include <cmath>
> using namespace std;

> #define ROTL(n,X) ( ( ( X ) << n ) | ( ( X ) >> ( 32 - n ) ) )

Why not an inline function?

> typedef unsigned int word32;

A better solution might be to use uint32_t, from <stdint.h>.
It's not standard C++ (yet), but it is standard C, is fairly
widespread, and easy to implement in the specific cases where it
is missing.

> class SHA1Hash
> {
> private:

> void Process(std::vector<word32>& message)

I'm not sure how you handle this. I would have expected a
std::string, or something similar as parameter. You're counting
on the user having packed his bytes into the word32 correctly.

> {
> // Pre-processing:
> // 1. append a single "1" bit to message
> // 2. append "0" bits until message length = 490 = -32
> (mod 512)
> // 3. append length of message (before pre-processing),
> in bits as 32-bit big-endian integer to message

> word32 messageLength = message.size() * 32;

Isn't there a risk of overflow here? (Not in your test case, of
course, but in the general case.) I use a uint64_t in my MD5
code.

> // Step 1
> word32 padding = 0x8000000; // 1000 0000 0000 0000

Again, I'm not sure about this. In the MD5 implementation, I
buffer bytes, and at this point, I append a *byte* with 0x80.
This may be a difference between the algorithms, however;
looking at my code, I have the impression that MD5 treats the in
little endian order when creating words. (At any rate, if you
are buffering bytes, appending 0x80 is the correct operation in
either case.)

Also, I avoid modifying the user's buffer. (In fact, I never see
the user's buffer; I work on either a pair of iterators, or
bytes passed in by means of the += operator.) I copy the bytes
into a local buffer until I have a block, then process that
block, and only save the resulting interblock state.

> message.push_back(padding);

> // Step 2
> padding = 0;
> while( (message.size() % 16) != 15 )
> message.push_back(padding);

Similar comments. I insert 0x00 bytes into my local buffer
until the total byte count modulo 512/8 is 448/8.

> // Step 3
> message.push_back(messageLength);

And here, I'm sure there is a mistake. The Wiki page says that
the message length to be appended is a 64 bit unsigned integer.
This means two insertions of the value you calculated as an
int64_t. From my own code (for MD5):

appendWord( length & 0xFFFFFFFF ) ;
appendWord( length >> 32 ) ;

You'd probably have to invert the order -- as I said, looking at
my own code, I think that MD5 is LSB first, and the Wiki page
says that SHA-1 is MSB first. (But I'd want to verify all of
the details in the RFC. Because there may be an issue of bit
order in the bytes as well which has to be addresses -- all of
the hardware I've ever used uses LSB bit order within bytes.)

> //
> // Actual Hash Code
> //

[I've cut the code here. Without detailed knowledge of the
algorithm, I can't really analyse it to see whether it
contains errors or not.]

>
> ... a public function that calls process

> I have tried it with an empty vector and get
> 5bc0ce0 b2008157 3de49bed 97b3e936 af3f86ce

> This is what I should get:
> da39a3ee5e6b4b0d3255bfef95601890afd80709

> I have looked through this for hours now and can't see
> anything wrong with it. Can anyone give me a hand?

For starters, append a 64 bit length, and not a 32 bit length.
And verify all of your byte ordering -- which is something that
I definitely wouldn't leave up to the caller.

> btw, this is NOT homework, this is a personal project.

It would be a pretty strange homework assignment. Unless the
prof needed it for one of his personal projects, of course:-).

--
James Kanze mailto: james.kanze(a)free.fr
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre S?mard, 78210 St.-Cyr-l'?cole, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

| Next | Last
Pages: 1 2 3 4 5
Prev: C++ Concepts
Next: Portable way to write binary data