From: Huber57 on 31 Mar 2010 10:11 To whom: I have a directory with a number of .dita files in it (each can be opened in notepad). Inside these files are differing numbers of 'keywords' and 'index terms'. Each of these words is between tags. eg: <keyword>help</keyword> or <indexterm>data administration</indexterm> I would like to be able to run a script to pull these keywords and index terms out and place them in either a MS word doc or an MS excel spreadsheet. Am I am in the right forum? How would I go about doing this? Sincerely, Doug
From: Bob Barrows on 31 Mar 2010 14:01 So the idea is to get the just the two elements from 506 files into a spreadsheet? Or is it enough to simply get all the data from all 506 files into a single spreadsheet? If the former, I still need to see the entire structure of at least one of the rows of data (two rows would be better). If the latter, assuming that all 506 files are in a single folder, it shouldn't be too hard to use the msxml parser in combination with filesystemobject to loop through the files and append the contents of each file to a single xml document. Again, though, if you need more specific help, you need to provide more information. Huber57 wrote: > Bob, > > Thanks much for the reply. I renamed one of the files (to an .xml > format) and opened it in excel and it (very nicely) dropped the file > into the spreadsheet with headers and the data listed below. > > Unfortunately, I have 506 files. I was hoping to automate. I have > never done any scripting before. > > Thoughts? > > Doug > > "Bob Barrows" wrote: > >> Huber57 wrote: >>> To whom: >>> I have a directory with a number of .dita files in it (each can be >>> opened in notepad). >>> Inside these files are differing numbers of 'keywords' and 'index >>> terms'. Each of these words is between tags. >>> >>> eg: <keyword>help</keyword> >>> or >>> <indexterm>data administration</indexterm> >>> >>> I would like to be able to run a script to pull these keywords and >>> index terms out and place them in either a MS word doc or an MS >>> excel spreadsheet. >>> >>> Am I am in the right forum? How would I go about doing this? >>> >> This would be a trivial problem if the files contained well-formed >> xml as your samples make it appear. The problem is, we cannot be >> sure if they really contain well-formed xml based on what you've >> described. You need to show us an actual sample of the data >> contained in one of these files. >> >> To illustrate how trivial this problem might be, create a text file >> containing nothing but: >> >> <items> >> <item> >> <keyword>help</keyword> >> <indexterm>data administration</indexterm> >> </item> >> <item> >> <keyword>help</keyword> >> <indexterm>network administration</indexterm> >> </item> >> </items> >> >> Save it as xmltest.txt. Then open Excel, click the Open button on the >> toolbar, navigate to the folder containing the file you just saved, >> change the file type to XML Files so you can see the file you saved >> and open it. Excel will prompt you to tell it how to handle it - >> tell it to import it as an XML List. >> >> If the files don't really contain valid, well-formed xml, we will >> need to see more of what they contain if you need more than generic >> advice. >> -- Microsoft MVP - ASP/ASP.NET - 2004-2007 Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM"
From: Huber57 on 31 Mar 2010 14:46 Bob, I would prefer the former (1 spreadsheet, all index terms and keywords). Here is some sample code. <title>Records</title> <prolog> <author>Mystery Writers</author> <metadata><keywords> <keyword>complication</keyword> <keyword>complications</keyword> <keyword>data</keyword> <keyword>health</keyword> <keyword>info</keyword> <keyword>information</keyword> <keyword>logbook</keyword> <keyword>logbooks</keyword> <keyword>my</keyword> <indexterm>complications</indexterm> <indexterm>logbook and records, complications</indexterm> </keywords></metadata> </prolog> <conbody> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum felis massa, ultricies eu auctor in, aliquam et mauris. Sed lobortis facilisis nisl, vitae sagittis eros interdum ac. In dolor velit, </p> </conbody> <related-links> <linklist> <title>Related Links</title> <link href=otherrecords.dita"><linktext>Other Records</linktext></link> </linklist> </related-links> </concept> Please let me know if you need anything else. "Bob Barrows" wrote: > So the idea is to get the just the two elements from 506 files into a > spreadsheet? Or is it enough to simply get all the data from all 506 files > into a single spreadsheet? > > If the former, I still need to see the entire structure of at least one of > the rows of data (two rows would be better). > > If the latter, assuming that all 506 files are in a single folder, it > shouldn't be too hard to use the msxml parser in combination with > filesystemobject to loop through the files and append the contents of each > file to a single xml document. Again, though, if you need more specific > help, you need to provide more information. > > > Huber57 wrote: > > Bob, > > > > Thanks much for the reply. I renamed one of the files (to an .xml > > format) and opened it in excel and it (very nicely) dropped the file > > into the spreadsheet with headers and the data listed below. > > > > Unfortunately, I have 506 files. I was hoping to automate. I have > > never done any scripting before. > > > > Thoughts? > > > > Doug > > > > "Bob Barrows" wrote: > > > >> Huber57 wrote: > >>> To whom: > >>> I have a directory with a number of .dita files in it (each can be > >>> opened in notepad). > >>> Inside these files are differing numbers of 'keywords' and 'index > >>> terms'. Each of these words is between tags. > >>> > >>> eg: <keyword>help</keyword> > >>> or > >>> <indexterm>data administration</indexterm> > >>> > >>> I would like to be able to run a script to pull these keywords and > >>> index terms out and place them in either a MS word doc or an MS > >>> excel spreadsheet. > >>> > >>> Am I am in the right forum? How would I go about doing this? > >>> > >> This would be a trivial problem if the files contained well-formed > >> xml as your samples make it appear. The problem is, we cannot be > >> sure if they really contain well-formed xml based on what you've > >> described. You need to show us an actual sample of the data > >> contained in one of these files. > >> > >> To illustrate how trivial this problem might be, create a text file > >> containing nothing but: > >> > >> <items> > >> <item> > >> <keyword>help</keyword> > >> <indexterm>data administration</indexterm> > >> </item> > >> <item> > >> <keyword>help</keyword> > >> <indexterm>network administration</indexterm> > >> </item> > >> </items> > >> > >> Save it as xmltest.txt. Then open Excel, click the Open button on the > >> toolbar, navigate to the folder containing the file you just saved, > >> change the file type to XML Files so you can see the file you saved > >> and open it. Excel will prompt you to tell it how to handle it - > >> tell it to import it as an XML List. > >> > >> If the files don't really contain valid, well-formed xml, we will > >> need to see more of what they contain if you need more than generic > >> advice. > >> > > -- > Microsoft MVP - ASP/ASP.NET - 2004-2007 > Please reply to the newsgroup. This email account is my spam trap so I > don't check it very often. If you must reply off-line, then remove the > "NO SPAM" > > > . >
From: Bob Barrows on 31 Mar 2010 16:23 I will not be able to return to this until tonight. Hopefully someone else will beat me to it, but if not, I'll check back then. Huber57 wrote: > Bob, > > I would prefer the former (1 spreadsheet, all index terms and > keywords). > > Here is some sample code. > > <title>Records</title> > <prolog> > <author>Mystery Writers</author> > > <metadata><keywords> > <keyword>complication</keyword> > <keyword>complications</keyword> > <keyword>data</keyword> > <keyword>health</keyword> > <keyword>info</keyword> > <keyword>information</keyword> > <keyword>logbook</keyword> > <keyword>logbooks</keyword> > <keyword>my</keyword> > <indexterm>complications</indexterm> > <indexterm>logbook and records, complications</indexterm> > </keywords></metadata> > </prolog> > > <conbody> > <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. > Vestibulum felis massa, ultricies eu auctor in, aliquam et mauris. > Sed lobortis facilisis nisl, vitae sagittis eros interdum ac. In > dolor velit, </p> </conbody> > <related-links> > <linklist> > <title>Related Links</title> > <link href=otherrecords.dita"><linktext>Other > Records</linktext></link> </linklist> > </related-links> > </concept> > > Please let me know if you need anything else. > > "Bob Barrows" wrote: > >> So the idea is to get the just the two elements from 506 files into a >> spreadsheet? Or is it enough to simply get all the data from all 506 >> files into a single spreadsheet? >> >> If the former, I still need to see the entire structure of at least >> one of the rows of data (two rows would be better). >> >> If the latter, assuming that all 506 files are in a single folder, it >> shouldn't be too hard to use the msxml parser in combination with >> filesystemobject to loop through the files and append the contents >> of each file to a single xml document. Again, though, if you need >> more specific help, you need to provide more information. >> >> >> Huber57 wrote: >>> Bob, >>> >>> Thanks much for the reply. I renamed one of the files (to an .xml >>> format) and opened it in excel and it (very nicely) dropped the file >>> into the spreadsheet with headers and the data listed below. >>> >>> Unfortunately, I have 506 files. I was hoping to automate. I have >>> never done any scripting before. >>> >>> Thoughts? >>> >>> Doug >>> >>> "Bob Barrows" wrote: >>> >>>> Huber57 wrote: >>>>> To whom: >>>>> I have a directory with a number of .dita files in it (each can be >>>>> opened in notepad). >>>>> Inside these files are differing numbers of 'keywords' and 'index >>>>> terms'. Each of these words is between tags. >>>>> >>>>> eg: <keyword>help</keyword> >>>>> or >>>>> <indexterm>data administration</indexterm> >>>>> >>>>> I would like to be able to run a script to pull these keywords and >>>>> index terms out and place them in either a MS word doc or an MS >>>>> excel spreadsheet. >>>>> >>>>> Am I am in the right forum? How would I go about doing this? >>>>> >>>> This would be a trivial problem if the files contained well-formed >>>> xml as your samples make it appear. The problem is, we cannot be >>>> sure if they really contain well-formed xml based on what you've >>>> described. You need to show us an actual sample of the data >>>> contained in one of these files. >>>> >>>> To illustrate how trivial this problem might be, create a text file >>>> containing nothing but: >>>> >>>> <items> >>>> <item> >>>> <keyword>help</keyword> >>>> <indexterm>data administration</indexterm> >>>> </item> >>>> <item> >>>> <keyword>help</keyword> >>>> <indexterm>network administration</indexterm> >>>> </item> >>>> </items> >>>> >>>> Save it as xmltest.txt. Then open Excel, click the Open button on >>>> the toolbar, navigate to the folder containing the file you just >>>> saved, change the file type to XML Files so you can see the file >>>> you saved >>>> and open it. Excel will prompt you to tell it how to handle it - >>>> tell it to import it as an XML List. >>>> >>>> If the files don't really contain valid, well-formed xml, we will >>>> need to see more of what they contain if you need more than generic >>>> advice. >>>> >> >> -- >> Microsoft MVP - ASP/ASP.NET - 2004-2007 >> Please reply to the newsgroup. This email account is my spam trap so >> I don't check it very often. If you must reply off-line, then remove >> the "NO SPAM" >> >> >> . -- Microsoft MVP - ASP/ASP.NET - 2004-2007 Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM"
From: Bob Barrows on 31 Mar 2010 21:31
The problem with this sample is there is no root document. If the entire content was nested inside a single element, perhaps called "dita" (see below for what I am talking about), then there would be no problem. I tried creating a test file with your data and opening it in Excel and got the expected error "document can contain only one top element". So, given your statement that you were able to open one of these files in Excel, I have to conclude that you have not shown me the actual structure. There are other syntax problems with this data (missing quotes, closing tags without opening tags) that I will have to correct as well. I am going to have to change your sample data into the correct format to test my code, which will appear at the bottom of this post. It's quick and dirty, but it is tested and it works. Huber57 wrote: > Bob, > > I would prefer the former (1 spreadsheet, all index terms and > keywords). > > Here is some sample code. > <dita> <title>Records</title> <prolog> <author>Mystery Writers</author> <metadata><keywords> <keyword>complication</keyword> <keyword>complications</keyword> <keyword>data</keyword> <keyword>health</keyword> <keyword>info</keyword> <keyword>information</keyword> <keyword>logbook</keyword> <keyword>logbooks</keyword> <keyword>my</keyword> <indexterm>complications</indexterm> <indexterm>logbook and records, complications</indexterm> </keywords></metadata> </prolog> <conbody> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum felis massa, ultricies eu auctor in, aliquam et mauris. Sed lobortis facilisis nisl, vitae sagittis eros interdum ac. In dolor velit, </p> </conbody> <related-links> <linklist> <title>Related Links</title> <link href="otherrecords.dita"><linktext>Other Records</linktext></link> </linklist> </related-links> <concept></concept> </dita> ************************************************************************************ dim fso,fldr,fil,xmldoc,nodes, kwnode,itnode, xl,wb,ws,kwrow,itrow dim pathtofiles 'replace with your path pathtofiles="c:\filelib" set fso=createobject("scripting.filesystemobject") set fldr=fso.getfolder(pathtofiles & "\dita") set xmldoc=createobject("msxml2.domdocument") set xl=createobject("excel.application") xl.workbooks.add set wb=xl.workbooks(1) set ws=wb.sheets(1) ws.name="dita_values" kwrow=1 itrow=1 ws.cells(kwrow,1).FormulaR1C1="Keywords" ws.cells(itrow,2).FormulaR1C1="Index Terms" ws.range("A1:B1").font.bold=true 'wscript.quit kwrow=2 itrow=2 for each fil in fldr.files xmldoc.load fil.path set nodes = nothing set nodes = xmldoc.selectnodes("//keyword") if not nodes is nothing then for each kwnode in nodes ws.cells(kwrow,1).FormulaR1C1=kwnode.text kwrow=kwrow+1 next else msgbox "no nodes were found" end if set nodes = nothing set nodes = xmldoc.selectnodes("//indexterm") if not nodes is nothing then for each itnode in nodes ws.cells(itrow,2).FormulaR1C1=itnode.text itrow=itrow+1 next end if next wb.saveas pathtofiles & "\keyword_indexterms.xls" xl.quit -- Microsoft MVP - ASP/ASP.NET - 2004-2007 Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM" |