From: MRAB on 5 Jan 2010 14:20 vsoler wrote: > On 5 ene, 19:35, MRAB <pyt...(a)mrabarnett.plus.com> wrote: >> vsoler wrote: >>> Hello, >>> I am acessing an Excel file by means of Win 32 COM technology. >>> For a given cell, I am able to read its formula. I want to make a map >>> of how cells reference one another, how different sheets reference one >>> another, how workbooks reference one another, etc. >>> Hence, I need to parse Excel formulas. Can I do it by means only of re >>> (regular expressions)? >>> I know that for simple formulas such as "=3*A7+5" it is indeed >>> possible. What about complex for formulas that include functions, >>> sheet names and possibly other *.xls files? >>> For example "=Book1!A5+8" should be parsed into ["=","Book1", "!", >>> "A5","+","8"] >>> Can anybody help? Any suggestions? >> Do you mean "how" or do you really mean "whether", ie, get a list of the >> other cells that are referred to by a certain cell, for example, >> "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5"]? > > I'd like to know how to do it, should it be possible. > Something like this should work: references = re.findall(r"\b((?:\w+!)?[A-Za-z]+\d+)\b", formula)
From: vsoler on 5 Jan 2010 14:21 On 5 ene, 20:05, Mensanator <mensana...(a)aol.com> wrote: > On Jan 5, 12:35 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote: > > > > > vsoler wrote: > > > Hello, > > > > I am acessing an Excel file by means of Win 32 COM technology. > > > For a given cell, I am able to read its formula. I want to make a map > > > of how cells reference one another, how different sheets reference one > > > another, how workbooks reference one another, etc. > > > > Hence, I need to parse Excel formulas. Can I do it by means only of re > > > (regular expressions)? > > > > I know that for simple formulas such as "=3*A7+5" it is indeed > > > possible. What about complex for formulas that include functions, > > > sheet names and possibly other *.xls files? > > > > For example "=Book1!A5+8" should be parsed into ["=","Book1", "!", > > > "A5","+","8"] > > > > Can anybody help? Any suggestions? > > > Do you mean "how" or do you really mean "whether", ie, get a list of the > > other cells that are referred to by a certain cell, for example, > > "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5] > > Ok, although "Book1" would be the default name of a workbook, with > default > worksheets labeled "Sheet1". "Sheet2", etc. > > If I had a worksheet named "Sheety" that wanted to reference a cell on > "Sheetx" > OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to > a completely > different workbook (say Book1 with worksheets labeled "Sheet1", > "Sheet2") then > the cell might have =[Book1]Sheet1!A7. > > And don't forget the $'s! You may see =[Book1]Sheet1!$A$7. Yes, Mensanator, but... what re should I use? I'm looking for the re statement. No doubt you can help! Thank you.
From: Tim Chase on 5 Jan 2010 14:49 vsoler wrote: > Hence, I need to parse Excel formulas. Can I do it by means only of re > (regular expressions)? > > I know that for simple formulas such as "=3*A7+5" it is indeed > possible. What about complex for formulas that include functions, > sheet names and possibly other *.xls files? Where things start getting ugly is when you have nested function calls, such as =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14, (Min(C1:C25)+3)*18,Max(B1:B25))) Regular expressions don't do well with nested parens (especially arbitrarily-nesting-depth such as are possible), so I'd suggest going for a full-blown parsing solution like pyparsing. If you have fair control over what can be contained in the formulas and you know they won't contain nested parens/functions, you might be able to formulate some sort of "kinda, sorta, maybe parses some forms of formulas" regexp. -tkc
From: vsoler on 5 Jan 2010 14:54 On 5 ene, 20:21, vsoler <vicente.so...(a)gmail.com> wrote: > On 5 ene, 20:05, Mensanator <mensana...(a)aol.com> wrote: > > > > > On Jan 5, 12:35 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote: > > > > vsoler wrote: > > > > Hello, > > > > > I am acessing an Excel file by means of Win 32 COM technology. > > > > For a given cell, I am able to read its formula. I want to make a map > > > > of how cells reference one another, how different sheets reference one > > > > another, how workbooks reference one another, etc. > > > > > Hence, I need to parse Excel formulas. Can I do it by means only of re > > > > (regular expressions)? > > > > > I know that for simple formulas such as "=3*A7+5" it is indeed > > > > possible. What about complex for formulas that include functions, > > > > sheet names and possibly other *.xls files? > > > > > For example "=Book1!A5+8" should be parsed into ["=","Book1", "!", > > > > "A5","+","8"] > > > > > Can anybody help? Any suggestions? > > > > Do you mean "how" or do you really mean "whether", ie, get a list of the > > > other cells that are referred to by a certain cell, for example, > > > "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5] > > > Ok, although "Book1" would be the default name of a workbook, with > > default > > worksheets labeled "Sheet1". "Sheet2", etc. > > > If I had a worksheet named "Sheety" that wanted to reference a cell on > > "Sheetx" > > OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to > > a completely > > different workbook (say Book1 with worksheets labeled "Sheet1", > > "Sheet2") then > > the cell might have =[Book1]Sheet1!A7. > > > And don't forget the $'s! You may see =[Book1]Sheet1!$A$7. > > Yes, Mensanator, but... what re should I use? I'm looking for the re > statement. No doubt you can help! > > Thank you. Let me give you an example: >>> import re >>> re.split("([^0-9])", "123+456*/") [123, +, 456, *, , /, ] I find it excellent that one single statement is able to do a lexical analysis of an expression! If the expression contains variables, such as A12 or B9, I can try another re expression. Which one should I use? And if my expression contains parenthesis? And the sin() function? Vicente Soler
From: MRAB on 5 Jan 2010 17:37
Mensanator wrote: > On Jan 5, 12:35 pm, MRAB <pyt...(a)mrabarnett.plus.com> wrote: >> vsoler wrote: >>> Hello, >>> I am acessing an Excel file by means of Win 32 COM technology. >>> For a given cell, I am able to read its formula. I want to make a map >>> of how cells reference one another, how different sheets reference one >>> another, how workbooks reference one another, etc. >>> Hence, I need to parse Excel formulas. Can I do it by means only of re >>> (regular expressions)? >>> I know that for simple formulas such as "=3*A7+5" it is indeed >>> possible. What about complex for formulas that include functions, >>> sheet names and possibly other *.xls files? >>> For example "=Book1!A5+8" should be parsed into ["=","Book1", "!", >>> "A5","+","8"] >>> Can anybody help? Any suggestions? >> Do you mean "how" or do you really mean "whether", ie, get a list of the >> other cells that are referred to by a certain cell, for example, >> "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5] > > Ok, although "Book1" would be the default name of a workbook, with > default > worksheets labeled "Sheet1". "Sheet2", etc. > > If I had a worksheet named "Sheety" that wanted to reference a cell on > "Sheetx" > OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to > a completely > different workbook (say Book1 with worksheets labeled "Sheet1", > "Sheet2") then > the cell might have =[Book1]Sheet1!A7. > > And don't forget the $'s! You may see =[Book1]Sheet1!$A$7. I forgot about the dollars! In that case, the regex is: references = re.findall(r"\b((?:\w+!)?\$?[A-Za-z]+\$?\d+)\b", formula) |