From: kvasikonkav on 2 Jul 2010 16:39 Hi- First, sorry for the long post. I don't want to annoy any of you, especially since I am in desperate need of your help. I've noticed that a lot of questions get answered effectively only after the OP states what the "big picture" is, so I am going to try to do that. I have an unbalanced (Different start and end dates, as well as left and right censored) panel (Company x Year) dataset. I am trying to test whether one specific variable, (call it X1) is a significant predictor of (1) the probability that an event, (call it X2 where X2=1,0) occurs in a specific year (logistic or multinomial logistic) (2) how long a company "lives" before the event occurs (PHREG or competing hazards model) (3) the characterstics of the company after the event occurs, (call it a vector X3) I am trying to "control for the usual suspects" (call it a vector X4) in my regressions. Regardless of the wisdom of doing so, I am following results from other published papers as my baseline model. In addition to the X1 and X4, I would like to test the significance of several other variables of interest. I have a set of characteristics or levels (call it vector X5) for each (company, year) observation. I woudl like to inclede these in my regression (I actually know how to do this all on my own!) I would like to create a a set of variables or differences (call it vector X6) that measure how far from"THE MEAN" the company is immediately before the event for each of the characteristics in X5. "THE MEAN" is a bit hard. I would like to group the data by some elements of X5 that make them "similar" (e.g. industry, year, size, etc or combination of them, such as INDUSTRYxYEAR groupings) and take the "MEAN" of the group. For "MEAN" itself, I think that a plain old mean is not a great idea, so maybe a windsorized mean or some other measure of central tendency. For robustness, I would like to use several different measures of central tendency. Once I have a decent set of different definitions of "the mean" (generated by different groupings of the data), I am unsure if a plain difference, or ratio, would be more informative in the regression. Let's say I'm talking about copany j, and I have a group, call it J, to which j belongs. Which ie better? Difference = X5(j) - MEAN(X5|J). Ratio = X5(j) / MEAN(X5|J). I would also like to see if "how the company got here" matters in the regressions. for each company, the X5 vector is serially correlated through time. Lagged X5 is insanely colinear with X5, and power is all gone, and estimated variance is inflated. So, I am faced with a similar problem as above - I can either do a difference = X5- Lagged(X5), or a ratio = X5/Lagged(X5) So, in summary, I have a hypothesis that most companies are in or close to equilibrium most of the time. If a company deviates ( due to some unobservable shock) from the average for at least some elements of X5, the firm is more likely to have the defined event, and and the event will move the firm back toward equilibrium. Any comments on how I plan on testing the hypothesis, as described above, would REALLY be appreciated. Now, onto my specific problems: In order to get a few "Group Mean" estimates, I have been running Proc Univariate. Somehow, it seems insanely slow. I was able to speed it up a bit by switching to "By" instead of "Class", but My code is pasted below. Any suggestion on how to improve speed and/or be more clever about finding measures of central tendency? /*CODE*/ ods select WinsorizedMeans; ods output WINSORIZEDmeans=pooledest; proc univariate data=b01 trimmed=1 .02 winsorized=.02 robustscale /*noprint*/; Var csGP csOIBDP csOIADP csPI csIB csNI csCHE csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV csDLTIS csDLTR csDLCCH csFINCF csCHECH SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio MVEquity BtoM MLev IntCov ; run; data comp.pooledest; set pooledest; run; quit; /* --------------------------------------------------------------------------- --------------------------------------------------------------------------- This section takes the winsorized mean of several variables and stores them. This data is partitioned by a single variable: YEAR, FF12, SIZE, BEME --------------------------------------------------------------------------- --------------------------------------------------------------------------- */ proc sort data=b01; by year; run; ods select WinsorizedMeans; ods output WINSORIZEDmeans=Year; proc univariate data=b01 trimmed=1 .02 winsorized=.02 robustscale /*noprint*/; by year /*class year*/; Var csGP csOIBDP csOIADP csPI csIB csNI csCHE csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV csDLTIS csDLTR csDLCCH csFINCF csCHECH SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio MVEquity BtoM MLev IntCov ; run; data comp.year; set year; run; quit; /* --------------------------------------------------------------------------- --------------------------------------------------------------------------- This section takes the winsorized mean of several variables and stores them. This data is partitioned by a pair of variable: YEARxFF12, YEARxSIZE, YEARxBEME, FF12xSIZE, FF12xBEME, SIZExBEME --------------------------------------------------------------------------- --------------------------------------------------------------------------- */ proc sort data=b01; by year ff12 ; run; ods select WinsorizedMeans; ods output WINSORIZEDmeans=YRxFF; proc univariate data=b01 trimmed=1 .02 winsorized=.02 robustscale /*noprint*/; by year ff12 ; Var csGP csOIBDP csOIADP csPI csIB csNI csCHE csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV csDLTIS csDLTR csDLCCH csFINCF csCHECH SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio MVEquity BtoM MLev IntCov ; run; data comp.YRxFF; set YRxFF; run; /* --------------------------------------------------------------------------- --------------------------------------------------------------------------- This section takes the winsorized mean of several variables and stores them. This data is partitioned by a triplet of variable: YEARxFF12xSIZE (no BEME), YEARxFF12xBEME (no SIZE), YEARxSIZExBEME (no FF12), FF12xSIZExBEME (no YEAR) --------------------------------------------------------------------------- --------------------------------------------------------------------------- */ proc sort data=b01; by year ff12 size ; run; ods select WinsorizedMeans; ods output WINSORIZEDmeans=threenoBM; proc univariate data=b01 trimmed=1 .02 winsorized=.02 robustscale /*noprint*/; by year ff12 size ; Var csGP csOIBDP csOIADP csPI csIB csNI csCHE csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV csDLTIS csDLTR csDLCCH csFINCF csCHECH SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio MVEquity BtoM MLev IntCov ; run; data comp.threenoBM; set threenoBM; run; --------------------------------------------------------------------------- --------------------------------------------------------------------------- This section takes the winsorized mean of several variables and stores them. This data is partitioned by ALL variables: YEAR, FF12, SIZE, BEME --------------------------------------------------------------------------- --------------------------------------------------------------------------- */ proc sort data=b01; by year ff12 size beme; run; ods select WinsorizedMeans; ods output WINSORIZEDmeans=fourway; proc univariate data=b01 trimmed=1 .02 winsorized=.02 robustscale /*noprint*/; by year ff12 size beme; Var csGP csOIBDP csOIADP csPI csIB csNI csCHE csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV csDLTIS csDLTR csDLCCH csFINCF csCHECH SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio MVEquity BtoM MLev IntCov ; run; data comp.fourway; set fourway; run; proc sort data= year; by year; run; proc transpose data=year out=comp.year1; by year; id varname; var mean stdmean; run; PROC EXPORT DATA= WORK.year1 OUTFILE= "R:\Dissertation\ChartsAndTables\FinancialStatements.XLS" DBMS=EXCEL REPLACE; SHEET="Year"; RUN;
From: Tom Abernathy on 2 Jul 2010 20:01 I don't know anything about Windsorized means, but you can eliminate all but the last PROC SORT. If a dataset is sorted by YEAR FF12 SIZE BEME then it is sorted by YEAR and by YEAR FF12 ... On Jul 2, 4:39 pm, kvasikonkav <kvasikon...(a)gmail.com> wrote: > Hi- > > First, sorry for the long post. I don't want to annoy any of you, > especially since I am in desperate need of your help. I've noticed > that a lot of questions get answered effectively only after the OP > states what the "big picture" is, so I am going to try to do that. > > I have an unbalanced (Different start and end dates, as well as left > and right censored) panel (Company x Year) dataset. > > I am trying to test whether one specific variable, (call it X1) is a > significant predictor of > (1) the probability that an event, (call it X2 where X2=1,0) occurs > in a specific year (logistic or multinomial logistic) > (2) how long a company "lives" before the event occurs (PHREG or > competing hazards model) > (3) the characterstics of the company after the event occurs, (call > it a vector X3) > > I am trying to "control for the usual suspects" (call it a vector X4) > in my regressions. Regardless of the wisdom of doing so, I am > following results from other published papers as my baseline model. > > In addition to the X1 and X4, I would like to test the significance of > several other variables of interest. > > I have a set of characteristics or levels (call it vector X5) for each > (company, year) observation. I woudl like to inclede these in my > regression (I actually know how to do this all on my own!) > > I would like to create a a set of variables or differences (call it > vector X6) that measure how far from"THE MEAN" the company is > immediately before the event for each of the characteristics in X5. > > "THE MEAN" is a bit hard. I would like to group the data by some > elements of X5 that make them "similar" (e.g. industry, year, size, > etc or combination of them, such as INDUSTRYxYEAR groupings) and take > the "MEAN" of the group. For "MEAN" itself, I think that a plain old > mean is not a great idea, so maybe a windsorized mean or some other > measure of central tendency. For robustness, I would like to use > several different measures of central tendency. > > Once I have a decent set of different definitions of "the > mean" (generated by different groupings of the data), I am unsure if a > plain difference, or ratio, would be more informative in the > regression. Let's say I'm talking about copany j, and I have a group, > call it J, to which j belongs. Which ie better? Difference = X5(j) - > MEAN(X5|J). Ratio = X5(j) / MEAN(X5|J). > > I would also like to see if "how the company got here" matters in the > regressions. for each company, the X5 vector is serially correlated > through time. Lagged X5 is insanely colinear with X5, and power is all > gone, and estimated variance is inflated. So, I am faced with a > similar problem as above - I can either do a difference = X5- > Lagged(X5), or a ratio = X5/Lagged(X5) > > So, in summary, I have a hypothesis that most companies are in or > close to equilibrium most of the time. If a company deviates ( due to > some unobservable shock) from the average for at least some elements > of X5, the firm is more likely to have the defined event, and and the > event will move the firm back toward equilibrium. Any comments on how > I plan on testing the hypothesis, as described above, would REALLY be > appreciated. > > Now, onto my specific problems: > > In order to get a few "Group Mean" estimates, I have been running Proc > Univariate. Somehow, it seems insanely slow. I was able to speed it > up a bit by switching to "By" instead of "Class", but My code is > pasted below. Any suggestion on how to improve speed and/or be more > clever about finding measures of central tendency? > > /*CODE*/ > > ods select WinsorizedMeans; > ods output WINSORIZEDmeans=pooledest; > proc univariate data=b01 > trimmed=1 .02 > winsorized=.02 > robustscale > /*noprint*/; > Var > csGP csOIBDP csOIADP csPI csIB csNI csCHE > csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT > csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE > csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV > csDLTIS csDLTR csDLCCH csFINCF csCHECH > SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio > MVEquity BtoM MLev IntCov > ; > run; > data comp.pooledest; set pooledest; run; > quit; > > /* > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > This section takes the winsorized mean of several variables and stores > them. > This data is partitioned by a single variable: > YEAR, FF12, SIZE, BEME > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > */ > > proc sort data=b01; by year; run; > ods select WinsorizedMeans; > ods output WINSORIZEDmeans=Year; > proc univariate data=b01 > trimmed=1 .02 > winsorized=.02 > robustscale > /*noprint*/; > by year /*class year*/; > Var > csGP csOIBDP csOIADP csPI csIB csNI csCHE > csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT > csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE > csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV > csDLTIS csDLTR csDLCCH csFINCF csCHECH > SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio > MVEquity BtoM MLev IntCov > ; > run; > data comp.year; set year; run; > quit; > > /* > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > This section takes the winsorized mean of several variables and stores > them. > This data is partitioned by a pair of variable: > YEARxFF12, YEARxSIZE, YEARxBEME, FF12xSIZE, FF12xBEME, SIZExBEME > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > */ > > proc sort data=b01; > by year ff12 ; > run; > ods select WinsorizedMeans; > ods output WINSORIZEDmeans=YRxFF; > proc univariate data=b01 > trimmed=1 .02 > winsorized=.02 > robustscale > /*noprint*/; > by year ff12 ; > Var > csGP csOIBDP csOIADP csPI csIB csNI csCHE > csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT > csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE > csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV > csDLTIS csDLTR csDLCCH csFINCF csCHECH > SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio > MVEquity BtoM MLev IntCov > ; > run; > data comp.YRxFF; set YRxFF; run; > > /* > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > This section takes the winsorized mean of several variables and stores > them. > This data is partitioned by a triplet of variable: > YEARxFF12xSIZE (no BEME), YEARxFF12xBEME (no SIZE), > YEARxSIZExBEME (no FF12), FF12xSIZExBEME (no YEAR) > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > */ > > proc sort data=b01; > by year ff12 size ; > run; > ods select WinsorizedMeans; > ods output WINSORIZEDmeans=threenoBM; > proc univariate data=b01 > trimmed=1 .02 > winsorized=.02 > robustscale > /*noprint*/; > by year ff12 size ; > Var > csGP csOIBDP csOIADP csPI csIB csNI csCHE > csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT > csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE > csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV > csDLTIS csDLTR csDLCCH csFINCF csCHECH > SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio > MVEquity BtoM MLev IntCov > ; > run; > data comp.threenoBM; set threenoBM; run; > > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > This section takes the winsorized mean of several variables and stores > them. > This data is partitioned by ALL variables: > YEAR, FF12, SIZE, BEME > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > */ > > proc sort data=b01; > by year ff12 size beme; > run; > ods select WinsorizedMeans; > ods output WINSORIZEDmeans=fourway; > proc univariate data=b01 > trimmed=1 .02 > winsorized=.02 > robustscale > /*noprint*/; > by year ff12 size beme; > Var > csGP csOIBDP csOIADP csPI csIB csNI csCHE > csACT csPPENT csINTAN csAT csDLC csLCT csDLTT csLT > csCSTK csCAPS csRE csTSTK csCEQ csPSTK csSEQ csLSE > csOANCF csCAPX csAQC csIVNCF csPRSTKC csSSTK csDV > csDLTIS csDLTR csDLCCH csFINCF csCHECH > SaleToAsset NWC SaleToNWC NetCapx ROA ROE DivRatio > MVEquity BtoM MLev IntCov > ; > run; > data comp.fourway; set fourway; run; > > proc sort data= year; by year; run; > proc transpose data=year out=comp.year1; by year; id varname; var mean > stdmean; run; > PROC EXPORT DATA= WORK.year1 > OUTFILE= "R:\Dissertation\ChartsAndTables\FinancialStatements.XLS" > DBMS=EXCEL REPLACE; SHEET="Year"; > RUN;
From: kvasikonkav on 2 Jul 2010 22:49 Tom- Thanks! I'm really tired, so this sort of obvious redundancy is beyond my pea- brain. This will help. On Jul 2, 7:01 pm, Tom Abernathy <tom.aberna...(a)gmail.com> wrote: > I don't know anything about Windsorized means, but you can eliminate > all but the last PROC SORT. > If a dataset is sorted by YEAR FF12 SIZE BEME then it is sorted by > YEAR and by YEAR FF12 ... >
|
Pages: 1 Prev: data step over proc sql Next: Scorecard : Probabailities |