1. Observations
데이터 세트를 로딩하고 나서 해야할 일은 데이터 세트의 모양을 이해하는 것이다.
head(), tail() 등을 이용해 데이터 세트를 확인해볼 수 있다.
> head(ds)
Source: local data frame [6 x 24]
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir
1 2007-11-01 Canberra 8.0 24.3 0.0 3.4 6.3 NW
2 2007-11-02 Canberra 14.0 26.9 3.6 4.4 9.7 ENE
3 2007-11-03 Canberra 13.7 23.4 3.6 5.8 3.3 NW
4 2007-11-04 Canberra 13.3 15.5 39.8 7.2 9.1 NW
5 2007-11-05 Canberra 7.6 16.1 2.8 5.6 10.6 SSE
6 2007-11-06 Canberra 6.2 16.9 0.0 5.8 8.2 SE
Variables not shown: WindGustSpeed (dbl), WindDir9am (fctr), WindDir3pm (fctr),
WindSpeed9am (dbl), WindSpeed3pm (dbl), Humidity9am (int), Humidity3pm (int),
Pressure9am (dbl), Pressure3pm (dbl), Cloud9am (int), Cloud3pm (int), Temp9am
(dbl), Temp3pm (dbl), RainToday (fctr), RISK_MM (dbl), RainTomorrow (fctr)
> tail(ds)
Source: local data frame [6 x 24]
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir
1 2008-10-26 Canberra 7.9 26.1 0 6.8 3.5 NNW
2 2008-10-27 Canberra 9.0 30.7 0 7.6 12.1 NNW
3 2008-10-28 Canberra 7.1 28.4 0 11.6 12.7 N
4 2008-10-29 Canberra 12.5 19.9 0 8.4 5.3 ESE
5 2008-10-30 Canberra 12.5 26.9 0 5.0 7.1 NW
6 2008-10-31 Canberra 12.3 30.2 0 6.0 12.6 NW
Variables not shown: WindGustSpeed (dbl), WindDir9am (fctr), WindDir3pm (fctr),
WindSpeed9am (dbl), WindSpeed3pm (dbl), Humidity9am (int), Humidity3pm (int),
Pressure9am (dbl), Pressure3pm (dbl), Cloud9am (int), Cloud3pm (int), Temp9am
(dbl), Temp3pm (dbl), RainToday (fctr), RISK_MM (dbl), RainTomorrow (fctr)
2. Review Structure
Next we use str() to report on the structure of the dataset. Once again we get an overview of
what the data looks like, and also now, how it is stored.
> str(ds)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 366 obs. of 24 variables:
$ Date : Date, format: "2007-11-01" "2007-11-02" ...
$ Location : Factor w/ 49 levels "Adelaide","Albany",..: 10 10 10 10 10 10 10 10 10 10 ...
$ MinTemp : num 8 14 13.7 13.3 7.6 6.2 6.1 8.3 8.8 8.4 ...
$ MaxTemp : num 24.3 26.9 23.4 15.5 16.1 16.9 18.2 17 19.5 22.8 ...
$ Rainfall : num 0 3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 ...
$ Evaporation : num 3.4 4.4 5.8 7.2 5.6 5.8 4.2 5.6 4 5.4 ...
$ Sunshine : num 6.3 9.7 3.3 9.1 10.6 8.2 8.4 4.6 4.1 7.7 ...
$ WindGustDir : Ord.factor w/ 16 levels "N"<"NNE"<"NE"<..: 15 4 15 15 8 7 7 5 9 5 ...
3. Review Summary
> summary(ds)
Date Location MinTemp MaxTemp
Min. :2007-11-01 Canberra :366 Min. :-5.300 Min. : 7.60
1st Qu.:2008-01-31 Adelaide : 0 1st Qu.: 2.300 1st Qu.:15.03
Median :2008-05-01 Albany : 0 Median : 7.450 Median :19.65
Mean :2008-05-01 Albury : 0 Mean : 7.266 Mean :20.55
3rd Qu.:2008-07-31 AliceSprings : 0 3rd Qu.:12.500 3rd Qu.:25.50
Max. :2008-10-31 BadgerysCreek: 0 Max. :20.900 Max. :35.80
(Other) : 0
Rainfall Evaporation Sunshine WindGustDir
Min. : 0.000 Min. : 0.200 Min. : 0.000 NW : 73
1st Qu.: 0.000 1st Qu.: 2.200 1st Qu.: 5.950 NNW : 44
Median : 0.000 Median : 4.200 Median : 8.600 E : 37
Mean : 1.428 Mean : 4.522 Mean : 7.909 WNW : 35
3rd Qu.: 0.200 3rd Qu.: 6.400 3rd Qu.:10.500 ENE : 30
Max. :39.800 Max. :13.800 Max. :13.600 (Other):144
NA's :3 NA's : 3
'프로그래밍 Programming' 카테고리의 다른 글
Data Preparation (5) - Review (Data Formats) (0) | 2014.11.28 |
---|---|
Data Preparation (4) - Review (Meta Data Cleansing) (0) | 2014.11.28 |
Data Preparation (2) - Table Data Frame (tbl_df) (0) | 2014.11.28 |
Data Preparation (1) - Load (Dataset, Generic Variables) (0) | 2014.11.28 |
01. Rattle 설치 및 실행 (0) | 2014.11.27 |