**************************************************************;
* TAB030329s2.sas *;
* proc tabulate を使おう *;
* session 2 : table文で表を指定 *;
* 2003.3.29, 翔 *;
* 2008.3.13, 改定2 *;
**************************************************************;
options nocenter;
libname mydata "c:\";
/*
前セッションと同じく
1年間にクレジットカードを利用したホテル利用者の特徴を学習させることを目的としたのデータ。1,450人分を使って,実行例を示します.
*/
data creditcard;
infile "c:\train_space.txt" firstobs=2 dlm='09'x;; /*外部ファイルから入力*/
input id hotel xgender $ xage xincome xmarital $ xusage xretail xrest xenter xtrans xother; /*読み込むデータを指定*/
run;
proc print data=creditcard(obs=5);run;
/*
OBS id hotel xgender xage xincome xmarital xusage xretail xrest xenter xtrans xother
1 1 0 female 60 . single 300000 100000 1 0 0 30000
2 2 1 male 40 . single 30000 10000 0 0 0 30000
3 3 0 male 40 . single 100000 100000 1 0 10000 100000
4 4 0 male 30 . single 300000 300000 10000 100000 30000 10000
5 5 0 male 40 . single 10000 10000 1 0 0 1
*/
/*****************************************************
このセッションでは,表構成を指定するtable文の書き方を
説明します.
性別X収入の度数表ををつくります.
*/
proc tabulate data=creditcard;
class xgender xincome;
table xgender,xincome;
run;
/*
----------------------------------------------------------------------------------------------------------
| | xincome |
| |-----------------------------------------------------------------------------|
| | 1 | 100 | 300 | 500 | 700 | 1000 |
| |------------+------------+------------+------------+------------+------------|
| | N | N | N | N | N | N |
|--------------------------+------------+------------+------------+------------+------------+------------|
|xgender | | | | | | |
|--------------------------| | | | | | |
|female | 128.00| 59.00| 50.00| 25.00| 19.00| 19.00|
|--------------------------+------------+------------+------------+------------+------------+------------|
|male | 545.00| 55.00| 108.00| 129.00| 119.00| 160.00|
----------------------------------------------------------------------------------------------------------
*/
/*
要点は,
1.分類する変数を,class文で宣言する.
2.table文で,表側 カンマ(,) 表頭 を指定する.
です.
table文で,表の構成を指定するわけですが,ここで最低限必要なものは,
分類変数と3つの表制御記号のカンマ(,),アスタリスク(*),空白(" ")です.
カンマは表の次元を指定し,アスタリスクは交差を,空白は並列を示します.
*/
/********************************************************
次元指定
カンマの前に表側の変数を,カンマの後ろに表頭の変数を指定します.
freqのtables文のアスタリスクが,このカンマに相当します.
このように,カンマが1つで,2次元の表が指定できます.
カンマなしだと,1次元,カンマ2個だと3次元の表となります.
カンマ個数 次元 指定
0 1 表頭
1 2 表側, 表頭
2 3 シート,表側, 表頭
(3個以上は,指定できません)
*/
proc tabulate data=creditcard;
class xgender xincome xmarital;
table xmarital; /* 1次元*/
run;
/*
---------------------------
| xmarital |
|-------------------------|
| married | single |
|------------+------------|
| N | N |
|------------+------------|
| 929.00| 487.00|
---------------------------
*/
proc tabulate data=creditcard;
class xgender xincome xmarital;
table xincome,xmarital; /* 2次元*/
run;
/*
------------------------------------------------------
| | xmarital |
| |-------------------------|
| | married | single |
| |------------+------------|
| | N | N |
|--------------------------+------------+------------|
|xincome | | |
|--------------------------| | |
|1 | 448.00| 225.00|
|--------------------------+------------+------------|
|100 | 36.00| 78.00|
|--------------------------+------------+------------|
|300 | 76.00| 82.00|
|--------------------------+------------+------------|
|500 | 101.00| 53.00|
|--------------------------+------------+------------|
|700 | 108.00| 30.00|
|--------------------------+------------+------------|
|1000 | 160.00| 19.00|
------------------------------------------------------
*/
proc tabulate data=creditcard;
class xgender xincome xmarital;;
table xgender,xincome,xmarital; /* 3次元*/
run;
/*
xgender female
------------------------------------------------------
| | xmarital |
| |-------------------------|
| | married | single |
| |------------+------------|
| | N | N |
|--------------------------+------------+------------|
|xincome | | |
|--------------------------| | |
|1 | 58.00| 70.00|
|--------------------------+------------+------------|
|100 | 25.00| 34.00|
|--------------------------+------------+------------|
|300 | 24.00| 26.00|
|--------------------------+------------+------------|
|500 | 14.00| 11.00|
|--------------------------+------------+------------|
|700 | 13.00| 6.00|
|--------------------------+------------+------------|
|1000 | 12.00| 7.00|
------------------------------------------------------
xgender male
------------------------------------------------------
| | xmarital |
| |-------------------------|
| | married | single |
| |------------+------------|
| | N | N |
|--------------------------+------------+------------|
|xincome | | |
|--------------------------| | |
|1 | 390.00| 155.00|
|--------------------------+------------+------------|
|100 | 11.00| 44.00|
|--------------------------+------------+------------|
|300 | 52.00| 56.00|
|--------------------------+------------+------------|
|500 | 87.00| 42.00|
|--------------------------+------------+------------|
|700 | 95.00| 24.00|
|--------------------------+------------+------------|
|1000 | 148.00| 12.00|
------------------------------------------------------
*/
/***************************************************
交差指定
1つの次元の中で2変数以上のクロスを指定します.
交差したい変数同士をアスタリスクでつなぎます.
freq の table var1*var2/LIST; に相当します.
*/
proc tabulate data=creditcard;
class xgender xincome xmarital;
table xgender*xincome,xmarital; /* 表側で交差 */
run;
/*
------------------------------------------------------
| | xmarital |
| |-------------------------|
| | married | single |
| |------------+------------|
| | N | N |
|--------------------------+------------+------------|
|xgender |xincome | | |
|------------+-------------| | |
|female |1 | 58.00| 70.00|
| |-------------+------------+------------|
| |100 | 25.00| 34.00|
| |-------------+------------+------------|
| |300 | 24.00| 26.00|
| |-------------+------------+------------|
| |500 | 14.00| 11.00|
| |-------------+------------+------------|
| |700 | 13.00| 6.00|
| |-------------+------------+------------|
| |1000 | 12.00| 7.00|
|------------+-------------+------------+------------|
|male |1 | 390.00| 155.00|
| |-------------+------------+------------|
| |100 | 11.00| 44.00|
| |-------------+------------+------------|
| |300 | 52.00| 56.00|
| |-------------+------------+------------|
| |500 | 87.00| 42.00|
| |-------------+------------+------------|
| |700 | 95.00| 24.00|
| |-------------+------------+------------|
| |1000 | 148.00| 12.00|
------------------------------------------------------
*/
proc tabulate data=creditcard;
class xgender xincome xmarital xretail;
table xgender*xincome,xretail*xmarital; /* 表側と表頭で交差 */
run;
/*
----------------------------------------------------------------------------------------------------------
| | xretail |
| |-----------------------------------------------------------------------------|
| | 0 | 1 | 10000 |
| |-------------------------+-------------------------+-------------------------|
| | xmarital | xmarital | xmarital |
| |-------------------------+-------------------------+-------------------------|
| | married | single | married | single | married | single |
| |------------+------------+------------+------------+------------+------------|
| | N | N | N | N | N | N |
|--------------------------+------------+------------+------------+------------+------------+------------|
|xgender |xincome | | | | | | |
|------------+-------------| | | | | | |
|female |1 | 6.00| 9.00| 1.00| 3.00| 6.00| 4.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |100 | 2.00| 4.00| 1.00| .| 3.00| 6.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |300 | 4.00| 4.00| .| .| .| 2.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |500 | 2.00| 1.00| 1.00| .| 1.00| .|
| |-------------+------------+------------+------------+------------+------------+------------|
| |700 | 3.00| .| 2.00| .| .| 1.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |1000 | 3.00| 1.00| .| .| 1.00| 1.00|
|------------+-------------+------------+------------+------------+------------+------------+------------|
|male |1 | 119.00| 47.00| 21.00| 8.00| 33.00| 25.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |100 | 1.00| 19.00| 1.00| 1.00| 1.00| 3.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |300 | 13.00| 17.00| 3.00| 2.00| 6.00| 4.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |500 | 22.00| 10.00| 4.00| 1.00| 14.00| 5.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |700 | 22.00| 9.00| 7.00| 2.00| 15.00| 1.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |1000 | 47.00| 3.00| 8.00| 1.00| 15.00| .|
----------------------------------------------------------------------------------------------------------
(Continued)
----------------------------------------------------------------------------------------------------------
| | xretail |
| |-----------------------------------------------------------------------------|
| | 30000 | 100000 | 300000 |
| |-------------------------+-------------------------+-------------------------|
| | xmarital | xmarital | xmarital |
| |-------------------------+-------------------------+-------------------------|
| | married | single | married | single | married | single |
| |------------+------------+------------+------------+------------+------------|
| | N | N | N | N | N | N |
|--------------------------+------------+------------+------------+------------+------------+------------|
|xgender |xincome | | | | | | |
|------------+-------------| | | | | | |
|female |1 | 13.00| 21.00| 15.00| 16.00| 17.00| 17.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |100 | 5.00| 7.00| 5.00| 13.00| 9.00| 4.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |300 | 6.00| 5.00| 8.00| 7.00| 6.00| 8.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |500 | 2.00| 3.00| 4.00| 4.00| 4.00| 3.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |700 | 2.00| 1.00| 5.00| 4.00| 1.00| .|
| |-------------+------------+------------+------------+------------+------------+------------|
| |1000 | 3.00| 1.00| 3.00| 2.00| 2.00| 2.00|
|------------+-------------+------------+------------+------------+------------+------------+------------|
|male |1 | 84.00| 25.00| 93.00| 39.00| 40.00| 11.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |100 | 1.00| 5.00| 7.00| 10.00| .| 6.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |300 | 14.00| 15.00| 14.00| 14.00| 2.00| 4.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |500 | 21.00| 7.00| 16.00| 11.00| 10.00| 8.00|
| |-------------+------------+------------+------------+------------+------------+------------|
| |700 | 12.00| 5.00| 29.00| 7.00| 10.00| .|
| |-------------+------------+------------+------------+------------+------------+------------|
| |1000 | 29.00| 1.00| 32.00| 2.00| 17.00| 5.00|
----------------------------------------------------------------------------------------------------------
*/
/***************************************************
並列指定
1つの次元の中で2変数以上を並列させます.
指定のしかたは,変数を空白で区切って並べればよく,
freq の table var1 var2; と同じです.
変数だけでなく,交差指定した変数の組を並列することもできます.
*/
proc tabulate data=creditcard;
class xgender xincome xmarital xretail;
table xgender xretail xgender*xretail,xmarital;
run;
/*
------------------------------------------------------
| | xmarital |
| |-------------------------|
| | married | single |
| |------------+------------|
| | N | N |
|--------------------------+------------+------------|
|xgender | | |
|--------------------------| | |
|female | 146.00| 154.00|
|--------------------------+------------+------------|
|male | 783.00| 333.00|
|--------------------------+------------+------------|
|xretail | | |
|--------------------------| | |
|0 | 244.00| 124.00|
|--------------------------+------------+------------|
|1 | 49.00| 18.00|
|--------------------------+------------+------------|
|10000 | 95.00| 52.00|
|--------------------------+------------+------------|
|30000 | 192.00| 96.00|
|--------------------------+------------+------------|
|100000 | 231.00| 129.00|
|--------------------------+------------+------------|
|300000 | 118.00| 68.00|
|--------------------------+------------+------------|
|xgender |xretail | | |
|------------+-------------| | |
|female |0 | 20.00| 19.00|
| |-------------+------------+------------|
| |1 | 5.00| 3.00|
| |-------------+------------+------------|
| |10000 | 11.00| 14.00|
| |-------------+------------+------------|
| |30000 | 31.00| 38.00|
| |-------------+------------+------------|
| |100000 | 40.00| 46.00|
| |-------------+------------+------------|
| |300000 | 39.00| 34.00|
|------------+-------------+------------+------------|
|male |0 | 224.00| 105.00|
| |-------------+------------+------------|
| |1 | 44.00| 15.00|
| |-------------+------------+------------|
| |10000 | 84.00| 38.00|
| |-------------+------------+------------|
| |30000 | 161.00| 58.00|
| |-------------+------------+------------|
| |100000 | 191.00| 83.00|
| |-------------+------------+------------|
| |300000 | 79.00| 34.00|
------------------------------------------------------
*/
/*
上の例では,性別と小売の年間利用料と性別X小売の年間利用料の3つを表側に並列させています.
ダミー変数のALLを並列させることによって,計をつくることを覚えておきましょう.
*/
proc tabulate data=creditcard;
class xgender xincome xmarital xretail;
table all xgender,all xretail;
run;
/*
----------------------------------------------------------------------------------------------------------
| | | xretail |
| | |----------------------------------------------------------------|
| | All | 0 | 1 | 10000 | 30000 | 100000 |
| |------------+------------+------------+------------+------------+------------|
| | N | N | N | N | N | N |
|--------------------------+------------+------------+------------+------------+------------+------------|
|All | 1416.00| 368.00| 67.00| 147.00| 288.00| 360.00|
|--------------------------+------------+------------+------------+------------+------------+------------|
|xgender | | | | | | |
|--------------------------| | | | | | |
|female | 300.00| 39.00| 8.00| 25.00| 69.00| 86.00|
|--------------------------+------------+------------+------------+------------+------------+------------|
|male | 1116.00| 329.00| 59.00| 122.00| 219.00| 274.00|
----------------------------------------------------------------------------------------------------------
(Continued)
-----------------------------------------
| | xretail |
| |------------|
| | 300000 |
| |------------|
| | N |
|--------------------------+------------|
|All | 186.00|
|--------------------------+------------|
|xgender | |
|--------------------------| |
|female | 73.00|
|--------------------------+------------|
|male | 113.00|
-----------------------------------------
*/
/*********************************************************
括弧()を使って,交差を簡略表記することもできます.
*/
proc tabulate data=creditcard;
class xgender xincome xmarital xretail;
table (all xgender)*(all xretail),xmarital;
run;
/*
------------------------------------------------------
| | xmarital |
| |-------------------------|
| | married | single |
| |------------+------------|
| | N | N |
|--------------------------+------------+------------|
|All |All | 929.00| 487.00|
| |-------------+------------+------------|
| |xretail | | |
| |-------------| | |
| |0 | 244.00| 124.00|
| |-------------+------------+------------|
| |1 | 49.00| 18.00|
| |-------------+------------+------------|
| |10000 | 95.00| 52.00|
| |-------------+------------+------------|
| |30000 | 192.00| 96.00|
| |-------------+------------+------------|
| |100000 | 231.00| 129.00|
| |-------------+------------+------------|
| |300000 | 118.00| 68.00|
|------------+-------------+------------+------------|
|xgender |All | 146.00| 154.00|
|------------+-------------+------------+------------|
|female |xretail | | |
| |-------------| | |
| |0 | 20.00| 19.00|
| |-------------+------------+------------|
| |1 | 5.00| 3.00|
| |-------------+------------+------------|
| |10000 | 11.00| 14.00|
| |-------------+------------+------------|
| |30000 | 31.00| 38.00|
| |-------------+------------+------------|
| |100000 | 40.00| 46.00|
| |-------------+------------+------------|
| |300000 | 39.00| 34.00|
|------------+-------------+------------+------------|
|male |All | 783.00| 333.00|
| |-------------+------------+------------|
| |xretail | | |
| |-------------| | |
| |0 | 224.00| 105.00|
| |-------------+------------+------------|
| |1 | 44.00| 15.00|
| |-------------+------------+------------|
| |10000 | 84.00| 38.00|
| |-------------+------------+------------|
| |30000 | 161.00| 58.00|
| |-------------+------------+------------|
| |100000 | 191.00| 83.00|
| |-------------+------------+------------|
| |300000 | 79.00| 34.00|
------------------------------------------------------
*/
/*
表側の (all xgender)*(all xretail) は,
all*all all*xretail xgender*all xgender*xretail
すなわち
all xretail xgender xgender*xretail
と指定したことと実質同じです.
表指定の階層構造は,
1.カンマ区切りによる次元指定
2.空白による変数または交差変数組の並列指定
3.アスタリスクによる交差変数指定
となります.
*/
/***********************************************************
このセッションの最後に表の見栄えを変えるオプションのいくつかを紹介します.
1.罫線をすべてはずす.
慣れると,罫線はじゃまなことが多いです.次のオプションで罫線は消せます.
proc tabulate formchar=" " noseps;
formcharで,罫線に使う文字を全11種とも空白に指定します.
nosepsは,縦のセパレータを無しにするという意味です.
2.欠損値も数える
SASは,欠損値を除外するのが基本なので,特別なことがない限り,
必ず欠損表示をさせましょう.
proc tabulate missing;
3.統計量フォーマットのデフォルトを指定
proc tabulate format=7.;
度数に小数点2桁は変ですから,デフォルトで変えておくと便利です.
comma9.なんかもお勧めです.
個別にも指定できますが,この話は後日に.
4.表側の幅を小さくしたい
table ../RTS=10;
rts=で,表側の幅を,ページ幅の何パーセントにするかを指定できます.
あまり短くすると,1カテゴリが折り返して,複数行になってしまうので,
うまく調整する必要があります.
5.分類変数のカテゴリ値のフォーマット
freqなどと同じく変数にフォーマットをあてれば,自由にカテゴリ分類を変更できます.
format xage agef.;
*/
proc format;
value agef low-<20 = "under20"
20-<30 = "20s"
30-<40 = "30s"
40-<50 = "40s"
50-<60 = "50s"
60-high = "60s+"
;
run;
proc tabulate data=creditcard formchar=" " noseps missing format=7.;
class xage xrest;
table all xrest
,all xrest
/rts=10;
format xage agef.;
run;
/* xrest
All 0 1 10000 30000 100000 300000
N N N N N N N
All 1450 1108 60 102 106 56 18
xrest
0 1108 1108 . . . . .
1 60 . 60 . . . .
10000 102 . . 102 . . .
30000 106 . . . 106 . .
100000 56 . . . . 56 .
300000 18 . . . . . 18
*/
/*
tabulateを使うなら,
proc tabulate formchar=" " noseps missing format=
この1行は,暗記するべきです.損はしませんよ.
*/
**********************************************************************;
************** E N D *************************************************;
**********************************************************************;
質問・意見
質問・意見一覧