From: jerem on 21 Oct 2009 01:59 Hey Graham, Nice to hear from you too. In answer to your questions -- in terms of getting a better pdf converter: the converter I work with works excellently when it has a clean or even a decent copy to work with. Unfortunately, many times the pdf's that we're asked to convert are grainy and sometimes even have watermarks that one has to contend with (which probably causes this misinterpretation in the conversion). Of course, we're not working with the original pdf (so can't remove the watermark either). In answer to " If you are entitled to edit these legal documents, couldn't you ask the originator for the document from which the PDF was created?" We always ask, they never comply - something to do with client-attorney privilege. Bunch of baloney because they know we're going to reproduce the document anyway. In terms of text boxes or frames: never thought of that one. What's the distinguishing factor between the two? When I looked up on Word Help about text boxes and frames, it talked about when it is best to use text boxes and when to use frames [and when they talked about frames it was in the context of web pages]. I guess it's very possible that a pdf was gotten right off the web and that was indeed that funky pdf I was talking about earlier. But since I have your attention, can you tell me how to get rid of that pesky drawing canvas surrounding one of my text boxes without getting rid of the textbox itself? Thanks as always for your help. "Graham Mayor" wrote: > Are you sure they are text boxes and not frames? > > PDF was always intended as a read only graphical format. Conversion to Word > document can, as you have found, be problematical. You need better OCR > software to avoid the issue in the first place. Try Finereader which does > not use text boxes to format the document and can be fine tuned to produce > better results - but conversion from PDF to Word is always going to be a > laborious process. > > If you are entitled to edit these legal documents, couldn't you ask the > originator for the document from which the PDF was created? > > -- > <>>< ><<> ><<> <>>< ><<> <>>< <>><<> > Graham Mayor - Word MVP > > My web site www.gmayor.com > Word MVP web site http://word.mvps.org > <>>< ><<> ><<> <>>< ><<> <>>< <>><<> > > > jerem wrote: > > Hey Greg, > > > > Nice to hear from you. The reason why I want to strip the text from > > the text box is for this reason: I predominantly work with large > > textual legal documents. Sometimes these documents come in the form > > of pdf's and what's needed is to convert the pdf into a Word > > document. Sometimes in the conversion process, the pdf conversion > > software has trouble identifying some text properly and will convert > > paragraphs into text boxes. So, I may end up with a 100 page > > document that has 20 or more text boxes in them which presents a > > problem when I now need to style the document with numbering schemes > > and other styles. Very big nuisance to have to manually go into each > > text box and grab the text out and delete the textbox. So to be able > > to use a macro to strip out all the text from the textboxes, then > > copy the entire conversion, do a paste special so that I now have > > nothing but unadulterated text is very helpful. > > > > I've tried your macro and the results I get are gray shaded areas of > > text (the text that was in the textboxs) and right below those > > paragraphs empty gray shaded text boxes. This is the result for all > > the text boxes. > > > > The code below (I only added the last part to take out the beginning > > and ending markings of each textbox) is right on target for taking > > the text out of the textbox, placing the text right where the text > > box was and, finally, deleting the textbox. The only problem I had > > with one document when using this macro was on one really funky pdf > > conversion - it halted the macro at a table and I got a message of > > somtehing to the effect of "this is not a shape" - I'm making that up > > but it was something along the lines of "this doesn't fall into the > > Shape category" which makes me wonder then why did it halt on it at > > all and why didn't it bypass it? The only other problem with the > > macro below was that if the drawing canvas is around any text box, it > > ignores it entirely, which spawns another question -- how do you get > > rid of that nuisance of e drawing canvas? I hit escape which makes > > it disappear, but then when you click on the textbox, it pops right > > back up. > > > > Anyway, try this macro out - it works quite nicely. > > > > Sub RemoveTextBoxText() > > > > ' GrabTextFromTextBox Macro > > 'Copies the Text From Each Textbox in document, deletes the text box > > ' and pastes the text where each text box once was: > > > > Dim shp As Shape > > Dim oRngAnchor As Range > > Dim sString As String > > For Each shp In ActiveDocument.Shapes > > If shp.Type = msoTextBox Then > > ' copy text to string, without last paragraph mark > > sString = Left(shp.TextFrame.TextRange.Text, _ > > shp.TextFrame.TextRange.Characters.Count - 1) > > If Len(sString) > 0 Then > > ' set the range to insert the text > > Set oRngAnchor = shp.Anchor.Paragraphs(1).Range > > ' insert the textbox text before the range object > > oRngAnchor.InsertBefore _ > > "Textbox start << " & sString & " >> Textbox end" > > End If > > shp.Delete > > End If > > Next shp > > 'Strip out beginning and ending textbox markers > > Selection.HomeKey Unit:=wdStory > > Selection.Find.ClearFormatting > > Selection.Find.Replacement.ClearFormatting > > With Selection.Find > > .Text = "Textbox start << " > > .Replacement.Text = "" > > .Forward = True > > ' .Wrap = wdFindContinue > > .Format = False > > .MatchCase = True > > .MatchWholeWord = False > > .MatchWildcards = False > > .MatchSoundsLike = False > > .MatchAllWordForms = False > > End With > > Selection.Find.Execute Replace:=wdReplaceAll > > With Selection.Find > > .Text = ">> Textbox end" > > .Replacement.Text = "" > > .Forward = True > > .Wrap = wdFindContinue > > .Format = False > > .MatchCase = True > > .MatchWholeWord = False > > .MatchWildcards = False > > .MatchSoundsLike = False > > .MatchAllWordForms = False > > End With > > Selection.Find.Execute Replace:=wdReplaceAll > > End Sub > > > > "Greg Maxey" wrote: > > > >> jerem, > >> > >> It is the "right back into the position the textbox resided" piece > >> that may prove impossible. You see a textbox is "anchored" to a > >> paragraph and that paragraph may or may not be where the textbox is. > >> I have read what you say you want to do, but I don't really no why. > >> If you want the text right where the textbox was then why not just > >> removed the borders from the box? > >> > >> Sub Macro1() > >> Dim oShp As Shape > >> For Each oShp In ActiveDocument.Shapes > >> If oShp.Type = msoTextBox Then > >> oShp.TextFrame.TextRange.Copy > >> oShp.Anchor.Paste > >> oShp.Delete > >> End If > >> Next > >> End Sub > >> > >> > >> jerem wrote: > >>> I've literally spent hours (googling every which way of posing this > >>> question -- select text in textbox, highlight text from textbox, > >>> copy text from textbox, set focus to textbox, etc.) trying to > >>> figure out how to do this with no success: > >>> > >>> Have a macro scan through an entire document looking for text boxes > >>> -- at every occurrence of finding a textbox, copy the text in the > >>> textbook, delete the text box and paste the text just copied from > >>> the textbox right back into the position that the textbox resided. > >>> (And yes, I've looked in the website for copying from the > >>> clipboard), So, with the code below I've been able to find the > >>> textboxes and delete them, but I cannot for the life of me figure > >>> how to actually get into those text boxes to copy the text. H E L > >>> P! I think I'm going to have to drink a whole bottle of wine now > >>> to console myself and if you're in a delightful mood, documentation > >>> would be lovely, but not absolutely necessary. As always, thanks > >>> in advance for your help. > >>> > >>> > >>> Sub Macro1() > >>> ' > >>> ' Macro1 Macro > >>> > >>> > >>> Dim i As Long > >>> For i = ActiveDocument.Shapes.Count To 1 Step -1 > >>> Selection.GoTo What:=wdGoToGraphic, Which:=wdGoToFirst, Count:=i, > >>> Name:="" > >>> > >>> 'ActiveDocument.Shapes("Text Box", i).Select > >>> 'Selection.WholeStory > >>> 'Selection.Copy > >>> > >>> Selection.Delete > >>> > >>> Next i > >>> > >>> End Sub > >> > >> -- > >> Greg Maxey > >> > >> See my web site http://gregmaxey.mvps.org > >> for an eclectic collection of Word Tips. > >> > >> "It is not the critic who counts, not the man who points out how the > >> strong man stumbles, or where the doer of deeds could have done them > >> better. The credit belongs to the man in the arena, whose face is > >> marred by dust and sweat and blood, who strives valiantly...who knows > >> the great enthusiasms, the great devotions, who spends himself in a > >> worthy cause, who at the best knows in the end the triumph of high > >> achievement, and who at the worst, if he fails, at least fails while > >> daring greatly, so that his place shall never be with those cold and > >> timid souls who have never known neither victory nor defeat." - TR > >> > >> > >> > >> . > > > . >
From: Graham Mayor on 21 Oct 2009 07:01
The principle difference between a frame and a text box is that a frame is in the text layer of the document (a paragraph formatting parameter) and a text box is in the drawing layer. As for the drawing canvas, it might be simpler to configure Word not to create the canvas, but if you select then cut the content to the clipboard, delete the canvas then paste the content back again, then that should work. -- <>>< ><<> ><<> <>>< ><<> <>>< <>><<> Graham Mayor - Word MVP My web site www.gmayor.com Word MVP web site http://word.mvps.org <>>< ><<> ><<> <>>< ><<> <>>< <>><<> jerem wrote: > Hey Graham, > > Nice to hear from you too. In answer to your questions -- in terms of > getting a better pdf converter: the converter I work with works > excellently when it has a clean or even a decent copy to work with. > Unfortunately, many times the pdf's that we're asked to convert are > grainy and sometimes even have watermarks that one has to contend > with (which probably causes this misinterpretation in the > conversion). Of course, we're not working with the original pdf (so > can't remove the watermark either). In answer to " If you are > entitled to edit these legal documents, couldn't you ask the > originator for the document from which the PDF was created?" We > always ask, they never comply - something to do with client-attorney > privilege. Bunch of baloney because they know we're going to > reproduce the document anyway. > > In terms of text boxes or frames: never thought of that one. What's > the distinguishing factor between the two? When I looked up on Word > Help about text boxes and frames, it talked about when it is best to > use text boxes and when to use frames [and when they talked about > frames it was in the context of web pages]. I guess it's very > possible that a pdf was gotten right off the web and that was indeed > that funky pdf I was talking about earlier. > > But since I have your attention, can you tell me how to get rid of > that pesky drawing canvas surrounding one of my text boxes without > getting rid of the textbox itself? > > Thanks as always for your help. > > "Graham Mayor" wrote: > >> Are you sure they are text boxes and not frames? >> >> PDF was always intended as a read only graphical format. Conversion >> to Word document can, as you have found, be problematical. You need >> better OCR software to avoid the issue in the first place. Try >> Finereader which does not use text boxes to format the document and >> can be fine tuned to produce better results - but conversion from >> PDF to Word is always going to be a laborious process. >> >> If you are entitled to edit these legal documents, couldn't you ask >> the originator for the document from which the PDF was created? >> >> -- >> <>>< ><<> ><<> <>>< ><<> <>>< <>><<> >> Graham Mayor - Word MVP >> >> My web site www.gmayor.com >> Word MVP web site http://word.mvps.org >> <>>< ><<> ><<> <>>< ><<> <>>< <>><<> >> >> >> jerem wrote: >>> Hey Greg, >>> >>> Nice to hear from you. The reason why I want to strip the text from >>> the text box is for this reason: I predominantly work with large >>> textual legal documents. Sometimes these documents come in the form >>> of pdf's and what's needed is to convert the pdf into a Word >>> document. Sometimes in the conversion process, the pdf conversion >>> software has trouble identifying some text properly and will convert >>> paragraphs into text boxes. So, I may end up with a 100 page >>> document that has 20 or more text boxes in them which presents a >>> problem when I now need to style the document with numbering schemes >>> and other styles. Very big nuisance to have to manually go into >>> each text box and grab the text out and delete the textbox. So to >>> be able to use a macro to strip out all the text from the >>> textboxes, then copy the entire conversion, do a paste special so >>> that I now have nothing but unadulterated text is very helpful. >>> >>> I've tried your macro and the results I get are gray shaded areas of >>> text (the text that was in the textboxs) and right below those >>> paragraphs empty gray shaded text boxes. This is the result for all >>> the text boxes. >>> >>> The code below (I only added the last part to take out the beginning >>> and ending markings of each textbox) is right on target for taking >>> the text out of the textbox, placing the text right where the text >>> box was and, finally, deleting the textbox. The only problem I had >>> with one document when using this macro was on one really funky pdf >>> conversion - it halted the macro at a table and I got a message of >>> somtehing to the effect of "this is not a shape" - I'm making that >>> up but it was something along the lines of "this doesn't fall into >>> the Shape category" which makes me wonder then why did it halt on >>> it at all and why didn't it bypass it? The only other problem with >>> the macro below was that if the drawing canvas is around any text >>> box, it ignores it entirely, which spawns another question -- how >>> do you get rid of that nuisance of e drawing canvas? I hit escape >>> which makes it disappear, but then when you click on the textbox, >>> it pops right back up. >>> >>> Anyway, try this macro out - it works quite nicely. >>> >>> Sub RemoveTextBoxText() >>> >>> ' GrabTextFromTextBox Macro >>> 'Copies the Text From Each Textbox in document, deletes the text box >>> ' and pastes the text where each text box once was: >>> >>> Dim shp As Shape >>> Dim oRngAnchor As Range >>> Dim sString As String >>> For Each shp In ActiveDocument.Shapes >>> If shp.Type = msoTextBox Then >>> ' copy text to string, without last paragraph mark >>> sString = Left(shp.TextFrame.TextRange.Text, _ >>> shp.TextFrame.TextRange.Characters.Count - 1) >>> If Len(sString) > 0 Then >>> ' set the range to insert the text >>> Set oRngAnchor = shp.Anchor.Paragraphs(1).Range >>> ' insert the textbox text before the range object >>> oRngAnchor.InsertBefore _ >>> "Textbox start << " & sString & " >> Textbox end" >>> End If >>> shp.Delete >>> End If >>> Next shp >>> 'Strip out beginning and ending textbox markers >>> Selection.HomeKey Unit:=wdStory >>> Selection.Find.ClearFormatting >>> Selection.Find.Replacement.ClearFormatting >>> With Selection.Find >>> .Text = "Textbox start << " >>> .Replacement.Text = "" >>> .Forward = True >>> ' .Wrap = wdFindContinue >>> .Format = False >>> .MatchCase = True >>> .MatchWholeWord = False >>> .MatchWildcards = False >>> .MatchSoundsLike = False >>> .MatchAllWordForms = False >>> End With >>> Selection.Find.Execute Replace:=wdReplaceAll >>> With Selection.Find >>> .Text = ">> Textbox end" >>> .Replacement.Text = "" >>> .Forward = True >>> .Wrap = wdFindContinue >>> .Format = False >>> .MatchCase = True >>> .MatchWholeWord = False >>> .MatchWildcards = False >>> .MatchSoundsLike = False >>> .MatchAllWordForms = False >>> End With >>> Selection.Find.Execute Replace:=wdReplaceAll >>> End Sub >>> >>> "Greg Maxey" wrote: >>> >>>> jerem, >>>> >>>> It is the "right back into the position the textbox resided" piece >>>> that may prove impossible. You see a textbox is "anchored" to a >>>> paragraph and that paragraph may or may not be where the textbox >>>> is. I have read what you say you want to do, but I don't really no >>>> why. If you want the text right where the textbox was then why not >>>> just removed the borders from the box? >>>> >>>> Sub Macro1() >>>> Dim oShp As Shape >>>> For Each oShp In ActiveDocument.Shapes >>>> If oShp.Type = msoTextBox Then >>>> oShp.TextFrame.TextRange.Copy >>>> oShp.Anchor.Paste >>>> oShp.Delete >>>> End If >>>> Next >>>> End Sub >>>> >>>> >>>> jerem wrote: >>>>> I've literally spent hours (googling every which way of posing >>>>> this question -- select text in textbox, highlight text from >>>>> textbox, copy text from textbox, set focus to textbox, etc.) >>>>> trying to figure out how to do this with no success: >>>>> >>>>> Have a macro scan through an entire document looking for text >>>>> boxes -- at every occurrence of finding a textbox, copy the text >>>>> in the textbook, delete the text box and paste the text just >>>>> copied from the textbox right back into the position that the >>>>> textbox resided. (And yes, I've looked in the website for copying >>>>> from the clipboard), So, with the code below I've been able to >>>>> find the textboxes and delete them, but I cannot for the life of >>>>> me figure how to actually get into those text boxes to copy the >>>>> text. H E L P! I think I'm going to have to drink a whole >>>>> bottle of wine now to console myself and if you're in a >>>>> delightful mood, documentation would be lovely, but not >>>>> absolutely necessary. As always, thanks in advance for your help. >>>>> >>>>> >>>>> Sub Macro1() >>>>> ' >>>>> ' Macro1 Macro >>>>> >>>>> >>>>> Dim i As Long >>>>> For i = ActiveDocument.Shapes.Count To 1 Step -1 >>>>> Selection.GoTo What:=wdGoToGraphic, Which:=wdGoToFirst, >>>>> Count:=i, Name:="" >>>>> >>>>> 'ActiveDocument.Shapes("Text Box", i).Select >>>>> 'Selection.WholeStory >>>>> 'Selection.Copy >>>>> >>>>> Selection.Delete >>>>> >>>>> Next i >>>>> >>>>> End Sub >>>> >>>> -- >>>> Greg Maxey >>>> >>>> See my web site http://gregmaxey.mvps.org >>>> for an eclectic collection of Word Tips. >>>> >>>> "It is not the critic who counts, not the man who points out how >>>> the strong man stumbles, or where the doer of deeds could have >>>> done them better. The credit belongs to the man in the arena, >>>> whose face is marred by dust and sweat and blood, who strives >>>> valiantly...who knows the great enthusiasms, the great devotions, >>>> who spends himself in a worthy cause, who at the best knows in the >>>> end the triumph of high achievement, and who at the worst, if he >>>> fails, at least fails while daring greatly, so that his place >>>> shall never be with those cold and timid souls who have never >>>> known neither victory nor defeat." - TR >>>> >>>> >>>> >>>> . >> >> >> . |