Prev: get PyObject* knowing its string name
Next: come and join www.pakdub.com where u can find friends, classifieds, games, music albums, events, blogs, chatrooms, video songs and lot more.... for free
From: jakecjacobson on 29 Jan 2010 12:25 I need to take a XML web resource and split it up into smaller XML files. I am able to retrieve the web resource but I can't find any good XML examples. I am just learning Python so forgive me if this question has been answered many times in the past. My resource is like: <document> ... ... </document> <document> ... ... </document> So in this example, I would need to output 2 files with the contents of each file what is between the open and close document tag.
From: Adam Tauno Williams on 29 Jan 2010 13:04 On Fri, 2010-01-29 at 09:25 -0800, jakecjacobson wrote: > I need to take a XML web resource and split it up into smaller XML > files. I am able to retrieve the web resource but I can't find any > good XML examples. I am just learning Python so forgive me if this > question has been answered many times in the past. > My resource is like: > <document> > ... > ... > </document> > <document> > ... > ... > </document> > So in this example, I would need to output 2 files with the contents > of each file what is between the open and close document tag. Do you want to parse the document or SaX? I have a SaX example at <http://coils.hg.sourceforge.net/hgweb/coils/coils/file/99b227b08f7f/src/coils/logic/workflow/xml/bpml.py>
From: jakecjacobson on 29 Jan 2010 13:34 On Jan 29, 1:04 pm, Adam Tauno Williams <awill...(a)opengroupware.us> wrote: > On Fri, 2010-01-29 at 09:25 -0800, jakecjacobson wrote: > > I need to take a XML web resource and split it up into smaller XML > > files. I am able to retrieve the web resource but I can't find any > > good XML examples. I am just learning Python so forgive me if this > > question has been answered many times in the past. > > My resource is like: > > <document> > > ... > > ... > > </document> > > <document> > > ... > > ... > > </document> > > So in this example, I would need to output 2 files with the contents > > of each file what is between the open and close document tag. > > Do you want to parse the document or SaX? > > I have a SaX example at > <http://coils.hg.sourceforge.net/hgweb/coils/coils/file/99b227b08f7f/s...> Thanks but I am way over my head with XML, Python. I am working with DDMS and need to output the individual resource nodes to their own file. I hope that this helps and I need a good example and how to use it. Here is what a resource node looks like: <ddms:Resource xsi:schemaLocation="https://metadata.dod.mil/mdr/ns/DDMS/1.4/ https://metadata.dod.mil/mdr/ns/DDMS/1.4/" xmlns:ddms="https://metadata.dod.mil/mdr/ns/DDMS/1.4/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ICISM="urn:us:gov:ic:ism:v2"> <ddms:identifier ddms:qualifier="URL" ddms:value="https:// metadata.dod.mil/mdr/ns/TBD/1.0/SampleTaxonomy.owl"/> <ddms:identifier ddms:qualifier="https://metadata.dod.mil/mdr/ ns/MDR/1.0/MDR.owl#GovernanceNamespace" ddms:value="TBD"/> <ddms:identifier ddms:qualifier="Version" ddms:value="1.0"/> <ddms:title ICISM:ownerProducer="USA" ICISM:classification="U">Sample Taxonomy</ddms:title> <ddms:description ICISM:ownerProducer="USA" ICISM:classification="U"> This is a sample taxonomy created for the Help page. </ddms:description> <ddms:dates ddms:posted="2007-11-24"/> <ddms:creator ICISM:ownerProducer="USA" ICISM:classification="U"> <ddms:Person> <ddms:name>Sample</ddms:name> <ddms:surname>Developer</ddms:surname> <ddms:affiliation>FGM, Inc.</ddms:affiliation> <ddms:phone>703-885-1000</ddms:phone> <ddms:email>sampleDeveloper(a)fgm.com</ddms:email> </ddms:Person> </ddms:creator> <ddms:security ICISM:ownerProducer="USA" ICISM:classification="U" ICISM:nonICmarkings="DIST_STMT_A" /> <!-- Other DDMS elements may appear here. --> </ddms:Resource> You can see the DDMS site at https://metadata.dod.mil/.
From: Stefan Behnel on 29 Jan 2010 14:24 jakecjacobson, 29.01.2010 18:25: > I need to take a XML web resource and split it up into smaller XML > files. I am able to retrieve the web resource but I can't find any > good XML examples. I am just learning Python so forgive me if this > question has been answered many times in the past. > > My resource is like: > > <document> > ... > ... > </document> > <document> > ... > ... > </document> Is this what you get as a document or is this just /contained/ in the document? Note that XML does not allow more than one root element, so the above is not XML. Each of the two <document>...</document> parts form an XML document by themselves, though. > So in this example, I would need to output 2 files with the contents > of each file what is between the open and close document tag. Are the two files formatted as you show above? In that case, you can simply iterate over the lines and cut the document when you see "<document>". Or, if you are sure that "<document>" only appears as top-most elements and not inside of the documents, you can search for "<document>" in the content (a string, I guess) and split it there. As was pointed out before, once you have these two documents, use the xml.etree package to work with them. Something like this might work: import xml.etree.ElementTree as ET data = urllib2.urlopen(url).read() for part in data.split('<document>'): document = ET.fromstring('<document>'+part) print(document.tag) # ... do other stuff Stefan
From: Sells, Fred on 29 Jan 2010 14:31
Google is your friend. Elementtree is one of the better documented IMHO, but there are many modules to do this. > -----Original Message----- > From: python-list-bounces+frsells=adventistcare.org(a)python.org > [mailto:python-list-bounces+frsells=adventistcare.org(a)python.org] On > Behalf Of Stefan Behnel > Sent: Friday, January 29, 2010 2:25 PM > To: python-list(a)python.org > Subject: Re: Processing XML File > > jakecjacobson, 29.01.2010 18:25: > > I need to take a XML web resource and split it up into smaller XML > > files. I am able to retrieve the web resource but I can't find any > > good XML examples. I am just learning Python so forgive me if this > > question has been answered many times in the past. > > > > My resource is like: > > > > <document> > > ... > > ... > > </document> > > <document> > > ... > > ... > > </document> > > Is this what you get as a document or is this just /contained/ in the > document? > > Note that XML does not allow more than one root element, so the above is > not XML. Each of the two <document>...</document> parts form an XML > document by themselves, though. > > > > So in this example, I would need to output 2 files with the contents > > of each file what is between the open and close document tag. > > Are the two files formatted as you show above? In that case, you can > simply > iterate over the lines and cut the document when you see "<document>". Or, > if you are sure that "<document>" only appears as top-most elements and > not > inside of the documents, you can search for "<document>" in the content (a > string, I guess) and split it there. > > As was pointed out before, once you have these two documents, use the > xml.etree package to work with them. > > Something like this might work: > > import xml.etree.ElementTree as ET > > data = urllib2.urlopen(url).read() > > for part in data.split('<document>'): > document = ET.fromstring('<document>'+part) > print(document.tag) > # ... do other stuff > > Stefan > -- > http://mail.python.org/mailman/listinfo/python-list |