갈루아의 반서재

1. Observations


데이터 세트를 로딩하고 나서 해야할 일은 데이터 세트의 모양을 이해하는 것이다. 

head(), tail() 등을 이용해 데이터 세트를 확인해볼 수 있다.

> head(ds)

Source: local data frame [6 x 24]


        Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir

1 2007-11-01 Canberra     8.0    24.3      0.0         3.4      6.3          NW

2 2007-11-02 Canberra    14.0    26.9      3.6         4.4      9.7         ENE

3 2007-11-03 Canberra    13.7    23.4      3.6         5.8      3.3          NW

4 2007-11-04 Canberra    13.3    15.5     39.8         7.2      9.1          NW

5 2007-11-05 Canberra     7.6    16.1      2.8         5.6     10.6         SSE

6 2007-11-06 Canberra     6.2    16.9      0.0         5.8      8.2          SE

Variables not shown: WindGustSpeed (dbl), WindDir9am (fctr), WindDir3pm (fctr),

  WindSpeed9am (dbl), WindSpeed3pm (dbl), Humidity9am (int), Humidity3pm (int),

  Pressure9am (dbl), Pressure3pm (dbl), Cloud9am (int), Cloud3pm (int), Temp9am

  (dbl), Temp3pm (dbl), RainToday (fctr), RISK_MM (dbl), RainTomorrow (fctr)



> tail(ds)

Source: local data frame [6 x 24]


        Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir

1 2008-10-26 Canberra     7.9    26.1        0         6.8      3.5         NNW

2 2008-10-27 Canberra     9.0    30.7        0         7.6     12.1         NNW

3 2008-10-28 Canberra     7.1    28.4        0        11.6     12.7           N

4 2008-10-29 Canberra    12.5    19.9        0         8.4      5.3         ESE

5 2008-10-30 Canberra    12.5    26.9        0         5.0      7.1          NW

6 2008-10-31 Canberra    12.3    30.2        0         6.0     12.6          NW

Variables not shown: WindGustSpeed (dbl), WindDir9am (fctr), WindDir3pm (fctr),

  WindSpeed9am (dbl), WindSpeed3pm (dbl), Humidity9am (int), Humidity3pm (int),

  Pressure9am (dbl), Pressure3pm (dbl), Cloud9am (int), Cloud3pm (int), Temp9am

  (dbl), Temp3pm (dbl), RainToday (fctr), RISK_MM (dbl), RainTomorrow (fctr)


2. Review Structure

Next we use str() to report on the structure of the dataset. Once again we get an overview of

what the data looks like, and also now, how it is stored.

> str(ds)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 366 obs. of  24 variables:

 $ Date         : Date, format: "2007-11-01" "2007-11-02" ...

 $ Location     : Factor w/ 49 levels "Adelaide","Albany",..: 10 10 10 10 10 10 10 10 10 10 ...

 $ MinTemp      : num  8 14 13.7 13.3 7.6 6.2 6.1 8.3 8.8 8.4 ...

 $ MaxTemp      : num  24.3 26.9 23.4 15.5 16.1 16.9 18.2 17 19.5 22.8 ...

 $ Rainfall     : num  0 3.6 3.6 39.8 2.8 0 0.2 0 0 16.2 ...

 $ Evaporation  : num  3.4 4.4 5.8 7.2 5.6 5.8 4.2 5.6 4 5.4 ...

 $ Sunshine     : num  6.3 9.7 3.3 9.1 10.6 8.2 8.4 4.6 4.1 7.7 ...

 $ WindGustDir  : Ord.factor w/ 16 levels "N"<"NNE"<"NE"<..: 15 4 15 15 8 7 7 5 9 5 ...

3. Review Summary

> summary(ds)

      Date                     Location      MinTemp          MaxTemp     

 Min.   :2007-11-01   Canberra     :366   Min.   :-5.300   Min.   : 7.60  

 1st Qu.:2008-01-31   Adelaide     :  0   1st Qu.: 2.300   1st Qu.:15.03  

 Median :2008-05-01   Albany       :  0   Median : 7.450   Median :19.65  

 Mean   :2008-05-01   Albury       :  0   Mean   : 7.266   Mean   :20.55  

 3rd Qu.:2008-07-31   AliceSprings :  0   3rd Qu.:12.500   3rd Qu.:25.50  

 Max.   :2008-10-31   BadgerysCreek:  0   Max.   :20.900   Max.   :35.80  

                      (Other)      :  0                                   

    Rainfall       Evaporation        Sunshine       WindGustDir 

 Min.   : 0.000   Min.   : 0.200   Min.   : 0.000   NW     : 73  

 1st Qu.: 0.000   1st Qu.: 2.200   1st Qu.: 5.950   NNW    : 44  

 Median : 0.000   Median : 4.200   Median : 8.600   E      : 37  

 Mean   : 1.428   Mean   : 4.522   Mean   : 7.909   WNW    : 35  

 3rd Qu.: 0.200   3rd Qu.: 6.400   3rd Qu.:10.500   ENE    : 30  

 Max.   :39.800   Max.   :13.800   Max.   :13.600   (Other):144  

                                   NA's   :3        NA's   :  3