Prev: C++ Concepts
Next: Portable way to write binary data
From: gkoreman on 20 Nov 2005 12:06 I am trying to implement SHA1 based on the pseudo-code on Wikipedia. The pseudo-code is on: http://en.wikipedia.org/wiki/SHA-1 My code is working, but is not giving me the correct hashes. This is being tested (initially) on a big-endian machine, and only after I have it working on big-endian was I planning on making it work on both big and little endian. #include <stdio.h> #include <vector> #include <string> #include <cmath> using namespace std; #define ROTL(n,X) ( ( ( X ) << n ) | ( ( X ) >> ( 32 - n ) ) ) typedef unsigned int word32; class SHA1Hash { private: void Process(std::vector<word32>& message) { // Pre-processing: // 1. append a single "1" bit to message // 2. append "0" bits until message length = 490 = -32 (mod 512) // 3. append length of message (before pre-processing), in bits as 32-bit big-endian integer to message word32 messageLength = message.size() * 32; // Step 1 word32 padding = 0x8000000; // 1000 0000 0000 0000 message.push_back(padding); // Step 2 padding = 0; while( (message.size() % 16) != 15 ) message.push_back(padding); // Step 3 message.push_back(messageLength); // // Actual Hash Code // unsigned int chunkIndex = 0; //Initialize variables: h0 = 0x67452301; h1 = 0xEFCDAB89; h2 = 0x98BADCFE; h3 = 0x10325476; h4 = 0xC3D2E1F0; // break message into 512-bit chunks while( chunkIndex < message.size() ) { // break chunk into sixteen 32-bit words w(i), 0 = i = 15 vector<word32> w(80, 0); for( unsigned int i = 0; i < 16; i++ ) w[i] = message[chunkIndex + i]; for( unsigned int i = 16; i < 80; i++ ) w[i] = ROTL( 1, (w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16])); word32 a = h0; word32 b = h1; word32 c = h2; word32 d = h3; word32 e = h4; for( unsigned int i = 0; i < 80; i++ ) { word32 f = 0; word32 k = 0; if( i < 20 ) { f = (b & c) | ((~b) & d); k = 0x5A827999; } else if( i < 40 ) { f = b ^ c ^ d; k = 0x6ED9EBA1; } else if( i < 60 ) { f = (b & c) | (b & d) | (c & d); k = 0x8F1BBCDC; } else if( i < 80 ) { f = b ^ c ^ d; k = 0xCA62C1D6; } word32 temp = ROTL(5, a) + f + e + k + w[i]; e = d; d = c; c = ROTL(30, b); b = a; a = temp; } // Add this chunk's hash to result so far: h0 = h0 + a; h1 = h1 + b; h2 = h2 + c; h3 = h3 + d; h4 = h4 + e; chunkIndex+=16; } } ... a public function that calls process I have tried it with an empty vector and get 5bc0ce0 b2008157 3de49bed 97b3e936 af3f86ce This is what I should get: da39a3ee5e6b4b0d3255bfef95601890afd80709 I have looked through this for hours now and can't see anything wrong with it. Can anyone give me a hand? btw, this is NOT homework, this is a personal project. Thanks [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Carlos Moreno on 21 Nov 2005 04:55 gkoreman(a)gmail.com wrote: > I have tried it with an empty vector and get > 5bc0ce0b20081573de49bed97b3e936af3f86ce > > This is what I should get: > da39a3ee5e6b4b0d3255bfef95601890afd80709 > > I have looked through this for hours now and can't see anything wrong > with it. Can anyone give me a hand? Two things -- your string is shorter than the expected/required 160 bits. I removed the spaces from your output, and the end is not aligned with the end of what you should get (I'm looking at it with non-proportional spacing font). Most likely, this is because of one of your cout statements that is supposed to output 0X (where X is whatever digit), and it's choosing to output X alone, omiting the leading 0. Also, pay attention to endianness -- perhaps after adding the presumably missing 0, you may notice a pattern of reversed bytes; the SHA1 description talks about concatenated and expressed as big-endian -- if you ran this on an x86 processor and did not pay attention to endianness, you might end up getting the wrong result. HTH, Carlos -- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Le Chaud Lapin on 21 Nov 2005 04:51 gkoreman(a)gmail.com wrote: > I am trying to implement SHA1 based on the pseudo-code on Wikipedia. > // Pre-processing: > // 1. append a single "1" bit to message > // 2. append "0" bits until message length = 490 = -32 > (mod 512) > // 3. append length of message (before pre-processing), > in bits as 32-bit big-endian integer to message You are not allowed to change the size of the 64-bit appendage or any other part of the alogrithm. The specification calls for padding until the length is congruent to 448 mod 512. It looks as if this is the source of your incorrect digest. >From Specification: "append a single "1" bit to message append "0" bits until message length = 448 = -64 (mod 512) append length of message (before pre-processing), in bits as 64-bit big-endian integer to message" > word32 messageLength = message.size() * 32; > > // Step 1 > word32 padding = 0x8000000; // 1000 0000 0000 0000 > message.push_back(padding); Are you sure you are on a big-endian machine? I tested your code and it seems that youre machine is little endian. In any case, again, you have to be fastidious in watching the spec in this section. In general, you cannot slap on an entire 64-bit chunk onto the message. For example, if the message is 440 bits long, you're only allowed to add 1 byte for padding: the bit-pattern of a 1 followed by 7 zeros. This will bring you to congruency mod 448, at which point you can tack on the 64-bit length to get to a full 512 bits. > // Step 3 > message.push_back(messageLength); You punter. :) Technically, you have to be prepared for crunching a digest whose size is greater than (2^32). Here you're only taking care of lengths up to 2^32. NOTES: 1. You want to be able to minimize required changes to code if you suddendly decide to switch to a different hash algorithm, say SHA-256. 2. You want the output of the hashing to be an object itself so that you can let it construct, assign it, compare it, store it in containers, toss it around, etc., let it destruct. 3. Loading a sequence of byes into a vector<> just to hash might be too slow. 4. You want the hash algorithm itself to be fast. SUGGESTIONS: I would define the algorithm itself as a namespace. Call it SHA_1: namespace SHA_1; Then you can define other hashing algorithms in their own namespaces: namespace MD5; namespace Adler32; Within your SHA_1 namespace, define a Digest and a Hasher. A Digest will contain your 160-bit output. A Digest should also be able to yield itself as a std::string (for printing). The Hasher should maintain a 64-byte state so that it can be used incrementally to hash disjoint data buffers, maintaining remnants mod 64 bytes. Overload two hashing functions in Hasher - one function for incremental hashing, another that does the same thing as the first but also takes by reference a Digest to collect the final digest and signal to the Hasher that it should reset itself to prepare for the next stream to be hashed: namespace SHA-1 { class Digest { unsigned char buffer[20]; operator string () const; // ... bool operator == (const Digest &that) const; bool operator != (const Digest &that) const {return !(*this == that);} bool operator > (const Digest &that) const; bool operator < (const Digest &that) const {return (that > *this);} bool operator >= (const Digest &that) const {return (*this > that) || (*this == that);} bool operator <= (const Digest &that) const {return (*this < that) || (*this } ; class Hasher { unsigned char block[64]; // 512 bits crunched at a time. unsigned long int lower_word_bit_count : 32; unsigned long int upper_word_bit_count : 32; unsigned long int h0 : 32; unsigned long int h1 : 32; unsigned long int h2 : 32; unsigned long int h3 : 32; unsigned long int h4 : 32; }; Hasher &hash (const void *buffer, unsigned long int length); Hasher &hash (const void *buffer, unsigned long int length, Digest &digest); } The first hashing function should hash as many 512-bit chunks of the incremental data streams as it can, retaining remnants of the penultimate stream (mod 64-bytes) in its block, until the Digest is asked for by the second hash function, at which point it can perform one final hash of the ultimate stream, padding as necessary, tacking on the 64-bit length, and crunching one final time the 64-byte block that includes the 64-bit length. Ron Rivest's implementation of MD5 hints at some of these principles, but they are not readily apparent because the code is written in C. See source code at end of: http://www.ietf.org/rfc/rfc1321.txt -Le Chaud Lapin- [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Thomas J. Gritzan on 21 Nov 2005 04:54 gkoreman(a)gmail.com schrieb: > I am trying to implement SHA1 based on the pseudo-code on Wikipedia. > The pseudo-code is on: > http://en.wikipedia.org/wiki/SHA-1 > > My code is working, but is not giving me the correct hashes. > This is being tested (initially) on a big-endian machine, and only > after I have it working on big-endian was I planning on making it work > on both big and little endian. [...] > // Step 1 > word32 padding = 0x8000000; // 1000 0000 0000 0000 > message.push_back(padding); [...] You forget a zero. Should be: word32 padding = 0x80000000; // or: 1 << 31; Works for me on a little-endian machine (only for zero-length messages). Thomas [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: James Kanze on 21 Nov 2005 05:33
gkoreman(a)gmail.com wrote: > I am trying to implement SHA1 based on the pseudo-code on > Wikipedia. The pseudo-code is on: > http://en.wikipedia.org/wiki/SHA-1 I doubt that this is the problem here, but I'd generally prefer the reference site for such things. The Wiki article you site gives a link, and the reference site (http://www.ietf.org/rfc/rfc3174.txt) contains a reference implementation and a lot more explications concerning the implementation than the Wiki article. > My code is working, but is not giving me the correct hashes. So which is it? Is it working, or isn't it? > This is being tested (initially) on a big-endian machine, and > only after I have it working on big-endian was I planning on > making it work on both big and little endian. Shouldn't make a difference. (I've no experience with SHA-1, but my implementation of MD5 works equaly well on both big and little endian machines, without any conditionals or whatever.) Since I haven't actually written an SHA-1 myself, I'll probably miss a number of important points, but several things strike me immediately. > #include <stdio.h> > #include <vector> > #include <string> > #include <cmath> > using namespace std; > #define ROTL(n,X) ( ( ( X ) << n ) | ( ( X ) >> ( 32 - n ) ) ) Why not an inline function? > typedef unsigned int word32; A better solution might be to use uint32_t, from <stdint.h>. It's not standard C++ (yet), but it is standard C, is fairly widespread, and easy to implement in the specific cases where it is missing. > class SHA1Hash > { > private: > void Process(std::vector<word32>& message) I'm not sure how you handle this. I would have expected a std::string, or something similar as parameter. You're counting on the user having packed his bytes into the word32 correctly. > { > // Pre-processing: > // 1. append a single "1" bit to message > // 2. append "0" bits until message length = 490 = -32 > (mod 512) > // 3. append length of message (before pre-processing), > in bits as 32-bit big-endian integer to message > word32 messageLength = message.size() * 32; Isn't there a risk of overflow here? (Not in your test case, of course, but in the general case.) I use a uint64_t in my MD5 code. > // Step 1 > word32 padding = 0x8000000; // 1000 0000 0000 0000 Again, I'm not sure about this. In the MD5 implementation, I buffer bytes, and at this point, I append a *byte* with 0x80. This may be a difference between the algorithms, however; looking at my code, I have the impression that MD5 treats the in little endian order when creating words. (At any rate, if you are buffering bytes, appending 0x80 is the correct operation in either case.) Also, I avoid modifying the user's buffer. (In fact, I never see the user's buffer; I work on either a pair of iterators, or bytes passed in by means of the += operator.) I copy the bytes into a local buffer until I have a block, then process that block, and only save the resulting interblock state. > message.push_back(padding); > // Step 2 > padding = 0; > while( (message.size() % 16) != 15 ) > message.push_back(padding); Similar comments. I insert 0x00 bytes into my local buffer until the total byte count modulo 512/8 is 448/8. > // Step 3 > message.push_back(messageLength); And here, I'm sure there is a mistake. The Wiki page says that the message length to be appended is a 64 bit unsigned integer. This means two insertions of the value you calculated as an int64_t. From my own code (for MD5): appendWord( length & 0xFFFFFFFF ) ; appendWord( length >> 32 ) ; You'd probably have to invert the order -- as I said, looking at my own code, I think that MD5 is LSB first, and the Wiki page says that SHA-1 is MSB first. (But I'd want to verify all of the details in the RFC. Because there may be an issue of bit order in the bytes as well which has to be addresses -- all of the hardware I've ever used uses LSB bit order within bytes.) > // > // Actual Hash Code > // [I've cut the code here. Without detailed knowledge of the algorithm, I can't really analyse it to see whether it contains errors or not.] > > ... a public function that calls process > I have tried it with an empty vector and get > 5bc0ce0 b2008157 3de49bed 97b3e936 af3f86ce > This is what I should get: > da39a3ee5e6b4b0d3255bfef95601890afd80709 > I have looked through this for hours now and can't see > anything wrong with it. Can anyone give me a hand? For starters, append a 64 bit length, and not a 32 bit length. And verify all of your byte ordering -- which is something that I definitely wouldn't leave up to the caller. > btw, this is NOT homework, this is a personal project. It would be a pretty strange homework assignment. Unless the prof needed it for one of his personal projects, of course:-). -- James Kanze mailto: james.kanze(a)free.fr Conseils en informatique orient?e objet/ Beratung in objektorientierter Datenverarbeitung 9 pl. Pierre S?mard, 78210 St.-Cyr-l'?cole, France +33 (0)1 30 23 00 34 [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ] |