Prev: Shortcut to an URL
Next: U++ Tutoring Plan
From: Franz Bachler on 20 Jul 2010 17:01 Hello, has anyone a code sample how to detect the word version with which the document was created? (in C if possible) worddetect testfile.doc "testfile.doc" has the format of winword 6 / 95 / 97 / 2000 / 2002 / 2003 or 2007 Greetings, Franz -- Franz Bachler, A-3250 Wieselburg E-Mail: fraba (at) gmx.at Homepage: http://members.aon.at/fraba oder http://home.pages.at/fraba
From: Jongware on 21 Jul 2010 04:25 On 20-Jul-10 23:01 PM, Franz Bachler wrote: > Hello, > > has anyone a code sample how to detect the word version with which the > document was created? > (in C if possible) > > worddetect testfile.doc > > "testfile.doc" has the format of winword 6 / 95 / 97 / 2000 / 2002 / 2003 > or 2007 Check http://en.wikipedia.org/wiki/DOC_(computing) -- its 3rd reference points to Microsoft's binary documentation of the Word file format, as a PDF. The main data structure is called the DOP ("Document Properties"), and its structure member 'nFib' holds a code for Word versions from 1.0 to Word 2007 (listed on p. 133 in that PDF). [Jw]
From: Franz Bachler on 21 Jul 2010 16:17 > Check http://en.wikipedia.org/wiki/DOC_(computing) -- its 3rd reference > points to Microsoft's binary documentation of the Word file format, as a > PDF. > > The main data structure is called the DOP ("Document Properties"), and its > structure member 'nFib' holds a code for Word versions from 1.0 to Word > 2007 (listed on p. 133 in that PDF). Okay, the nFib is the searched value. But I don't understand exactly how to detect where the nFib is in the Word File. Is it always on the same place? An Example: Word 2003; nFib = decimal 268 = Hex 010C; should be stored as 0C 01 (little endian) The first hit is 0007BC + 1 (because hex dump starts with 000000) = decimal 1981 0007B0 00 00 FF FF FF FF 00 00 00 00 02 00 0C 01 00 00 *................* Greetings, Franz
From: Jongware on 22 Jul 2010 05:45 On 21-Jul-10 22:17 PM, Franz Bachler wrote: >> Check http://en.wikipedia.org/wiki/DOC_(computing) -- its 3rd reference >> points to Microsoft's binary documentation of the Word file format, as a >> PDF. >> >> The main data structure is called the DOP ("Document Properties"), and its >> structure member 'nFib' holds a code for Word versions from 1.0 to Word >> 2007 (listed on p. 133 in that PDF). > > Okay, the nFib is the searched value. But I don't understand exactly how to > detect where the nFib is in the Word File. Is it always on the same place? > > An Example: Word 2003; nFib = decimal 268 = Hex 010C; should be stored as 0C > 01 (little endian) > > The first hit is 0007BC + 1 (because hex dump starts with 000000) = decimal > 1981 > > 0007B0 00 00 FF FF FF FF 00 00 00 00 02 00 0C 01 00 00 > *................* Yes -- Microsoft's documentation is not too clear (in spite of their "Open Source Promise" ;-)). Word files start with (hex) D0CF, but so do a lot of other files: they are all OLE streams. MS does provide a number of APIs to open OLE streams and select any arbitrary section of it for reading, but perhaps you don't need that *just to check the version*. At least the first 512 bytes are for the OLE stream only (and when there are more blocks, their size always is a multiple of 512). According to MS, "The FIB starts at the beginning of the file" -- well, it's the first in the "Root" stream of a Word file, so that's /almost/ true ... You can check if you have found a FIB by checking its Magic number 'wIdent', the first ushort; its value is not given (... oh well ...) but I see in a Word sample file it should be 0xA5EC. An additional check is at +0x22: "wMagicCreated / Unique number identifying the file�s creator. 0x6A62 is the creator ID for Word". The nFib ushort is right after wIdent -- for my file, I find a value of 0x00C1, or 193, indicating it's from Word 97. (More on OLE streams can be found on http://download.microsoft.com/.../WindowsCompoundBinaryFileFormatSpecification.pdf) [Jw]
From: Franz Bachler on 26 Jul 2010 08:21
Hello, here's the DOC-Checking-Program. Greetings, Franz #include <stdio.h> #include <stdlib.h> #include <string.h> #include <malloc.h> void nfibsearch(char *szPuffer, int ds) { int i,s; char szId[16]; char szVer[16]; for (i=512; i<4100; i+=512) { if (ds>i) { if (szPuffer[i]==(char) 0xEC && szPuffer[i+1]==(char) 0xA5) { printf("\n\n nFib Magic (EC A5) found at %d ",i); s=0; if (szPuffer[i+3]==(char) 0x00) { if (szPuffer[i+2]==(char) 0x65) {s=1; strcpy(szId, "00 65"); strcpy(szVer, "6.0");} if (szPuffer[i+2]==(char) 0x68) {s=1; strcpy(szId, "00 68"); strcpy(szVer, "95");} if (szPuffer[i+2]==(char) 0xC1) {s=1; strcpy(szId, "00 C1"); strcpy(szVer, "97");} if (szPuffer[i+2]==(char) 0xD9) {s=1; strcpy(szId, "00 D9"); strcpy(szVer, "2000");} } if (szPuffer[i+3]==(char) 0x01) { if (szPuffer[i+2]==(char) 0x01) {s=1; strcpy(szId, "01 01"); strcpy(szVer, "2002");} if (szPuffer[i+2]==(char) 0x0C) {s=1; strcpy(szId, "0C 01"); strcpy(szVer, "2003");} if (szPuffer[i+2]==(char) 0x12) {s=1; strcpy(szId, "12 01"); strcpy(szVer, "2007");} } if (s) printf("\n\n Word %s identifier (%s) found at %d ",szVer,szId,i+2); } } } } void stringsearch(char *szPuffer, char *szText, int ds) { int i,j,l,s; char szInfo[128]; l=(int) strlen(szText); if (l>120) return; for (i=0; i<ds; i++) { s=0; for (j=0; j<l; j++) { if (szPuffer[i+j]==szText[j]) s=1; else { s=0; break; } } if (s) { strcpy(szInfo, szText); for (j=0; j<3; j++) szInfo[l+j]=szPuffer[i+j+l]; szInfo[l+3]='\0'; printf("\n %s found at %d ",szInfo,i); } } } int main(int argc, char **argv) { int iSize,ds; char c,*szPuffer; FILE *dz; if (argc<2) { printf("\n Word Document Evaluation - call with "); printf("\n\n %s filename \n ",argv[0]); exit(1); } if ((dz=fopen(argv[1],"rb"))==NULL) { printf("\n Cannot open file %s! ",argv[1]); printf("\n (Possibly file not found?) \n "); exit(2); } fseek(dz, 0, SEEK_END); iSize=ftell(dz); fseek(dz, 0, SEEK_SET); if (iSize<=0) { printf("\n Problem with file %s \n ",argv[1]); fclose(dz); exit(3); } szPuffer = (char *) calloc(iSize+64, sizeof(char)); if (szPuffer==NULL) { printf("\n Unable to allocate puffer memory! "); printf("\n (Out of memory?) \n "); fclose(dz); exit(4); } ds=0; while (fread(&c,1,1,dz)>0) { if (ds<=iSize) szPuffer[ds++]=c; else break; } fclose(dz); printf("\n File %s - %d Bytes read \n",argv[1],ds); if (szPuffer[0]==(char)0xD0 && szPuffer[1]==(char)0xCF && szPuffer[2]==(char)0x11) printf("\n Word header found (D0 CF 11) \n"); else printf("\n Word header not found (D0 CF 11) \n"); // search for "Word.Document stringsearch(szPuffer, "Word.Document.", ds); // search for "Microsoft Word " stringsearch(szPuffer, "Microsoft Word ", ds); // nFib search nfibsearch(szPuffer, ds); printf("\n"); free(szPuffer); return(0); } |