From: David Latin on 22 Jul 2010 05:42 Hello, I am currently working on manipulating data in "vCard"-like format, and have become confused by the actions of the Cases, StringCases and Select functions. Consider the small list: In[1]:= list = {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR", "MM"} ; In[2]:= Cases[list, ___~~"END:"~~___] Out[2]= {} So pattern-matching obviously does not work with Cases for a list of strings. The documentation for Cases does not refer to patterns in strings, so I tried In[3]:= StringCases[list, ___~~"END:"~~___] Out[3]= {{"DTEND:19260412T175900"},{"DTEND:20070207T050000"},{"END:VCALENDAR"},{}} The problem here is that empty elements can be returned. So next I tried In[4]:= Select[list, ___~~"END:"~~___] Out[4]= {} Obviously not working. Next I tried In[5]:= Select[ list, StringMatchQ[#, "*END:*"] & ] Out[5]= {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR"} This is fine. But what if I only want the "END:" lines and not the "DTEND:" lines ? It may be appropriate to make use of In[6]:= Select[ list, StringFreeQ[#, "*DTEND:*"] & ] Out[6]= {"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR", "MM"} Not as expected! But, in the end, what works is: In[7]:= Select[ list, StringMatchQ[#, "*END:*"] && ! StringMatchQ[#, "*DTEND:*"] & ] Out[7]= {"END:VCALENDAR"} I know I could have used "END*" instead of "*END*", but that's not the point here. My questions then are: Why doesn't Cases work for a list of strings ? Why doesn't Select work for patterns with the ~~ operator ? Why doesn't StringFreeQ act in the same way as !StringMatchQ ? Any help over this confusion would be very much appreciated! Thank you, David
From: Bill Rowe on 23 Jul 2010 07:11 On 7/22/10 at 5:41 AM, d.latin(a)gmail.com (David Latin) wrote: >Hello, I am currently working on manipulating data in "vCard"-like >format, and have become confused by the actions of the Cases, >StringCases and Select functions. Consider the small list: >In[1]:= list = {"DTEND:19260412T175900", "DTEND:20070207T050000", >"END:VCALENDAR", "MM"} ; >In[2]:= Cases[list, ___~~"END:"~~___] Out[2]= {} >So pattern-matching obviously does not work with Cases for a list of >strings. Patterns and string patterns simply aren't the same. So, do In[12]:= Cases[list, _?(StringMatchQ[#, ___ ~~ "END:" ~~ ___] &)] Out[12]= {DTEND:19260412T175900,DTEND:20070207T050000,END:VCALENDAR} >The documentation for Cases does not refer to patterns in strings, >so I tried >In[3]:= StringCases[list, ___~~"END:"~~___] Out[3]= >{{"DTEND:19260412T175900"},{"DTEND:20070207T050000"},{"END:VCALENDAR >"},{}} >The problem here is that empty elements can be returned. That is easily fixed by doing either In[13]:= DeleteCases[StringCases[list, ___ ~~ "END:" ~~ ___], {}] Out[13]= {{"DTEND:19260412T175900"}, {"DTEND:20070207T050000"}, {"END:VCALENDAR"}} or In[14]:= StringCases[list, ___ ~~ "END:" ~~ ___] /. {} -> Sequence[] Out[14]= {{"DTEND:19260412T175900"}, {"DTEND:20070207T050000"}, {"END:VCALENDAR"}} >So next I tried >In[4]:= Select[list, ___~~"END:"~~___] Out[4]= {} >Obviously not working. Here, like with Cases a pure function using StringMatchQ will do what you need. That is, In[15]:= Select[list, StringMatchQ[#, ___ ~~ "END:" ~~ ___] &] Out[15]= {DTEND:19260412T175900,DTEND:20070207T050000,END:VCALENDAR} >Next I tried >In[5]:= Select[ list, StringMatchQ[#, "*END:*"] & ] Out[5]= >{"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR"} >This is fine. But what if I only want the "END:" lines and not the >"DTEND:" lines ? Change the pattern to be matched. For example, In[16]:= Select[list, StringMatchQ[#, "END:" ~~ ___] &] Out[16]= {END:VCALENDAR} >It may be appropriate to make use of >In[6]:= Select[ list, StringFreeQ[#, "*DTEND:*"] & ] Out[6]= >{"DTEND:19260412T175900", "DTEND:20070207T050000", "END:VCALENDAR", >"MM"} >Not as expected! Since StringFreeQ[string, pattern] returns true when a substring of string matches pattern, it isn't sensible to supply a pattern like ___~~pattern~~___. This just causes Mathematica to do more work than needed to achieve the desired result. So, do In[17]:= Select[list, StringFreeQ[#, "DTEND:"] &] Out[17]= {END:VCALENDAR,MM} Also, note the documentation for StringMatchQ under more information states "... ordinary StringExpression string patterns, as well as abbreviated string patterns containing the following metacharacters:" and specifically states a "*" is interpreted as zero or more characters. The documentation for StringFreeQ does not have any similar statement. So, I suspect for StringFreeQ, an "*" is taken to be a literal asterisk. Since none of strings in your list have a literal asterisk, all would be selected if StringFreeQ is interpreting the "*" at the end of you patterns as a literal asterisk. >But, in the end, what works is: >In[7]:= Select[ list, StringMatchQ[#, "*END:*"] && ! StringMatchQ[#, >"*DTEND:*"] & ] Out[7]= {"END:VCALENDAR"} >I know I could have used "END*" instead of "*END*", but that's not >the point here. >My questions then are: Why doesn't Cases work for a list of strings >? Why doesn't Select work for patterns with the ~~ operator ? Neither Cases nor Select is designed to use string patterns. You can use string patterns with these by creating a pattern or function that will evaluate to true or false using any of the functions that do accept string patterns as arguments. >Why doesn't StringFreeQ act in the same way as !StringMatchQ ? Why are you expecting these to be the same? StringFreeQ[string, pattern] returns true whenever no substring of string matches pattern. !StringMatchQ[string, pattern] returns true whenever the entire string fails to match pattern. There is a clear difference between matching a substring of a given string and the entire string.
|
Pages: 1 Prev: Code highlighting Next: Need to align data from sublists to Union of dates from all sublists |