Data Preparation (11) - Clean (Feature Selection)
FSelector (Romanski, 2013) 패키지는 주어진 데이터세트에서 속성을 선택할 수 있는 기능을 제공한다.
관련성이 없거나 불필요한 정보를 확정하고 제거하는 기능을 한다.
> library(FSelector)
> form <- formula(paste(target, "~.")) // paste() concatenate vectors after converting to character
> cfs(form, ds[vars]) // cfs : algorithm finds attribute subset using correlation and entropy measures for continous and discrete data[1] "min_temp" "sunshine" "wind_gust_speed" "humidity_3pm"
[5] "pressure_3pm" "cloud_3pm"
> information.gain(form, ds[vars]) // information.gain = H(Class) + H(Attribute) − H(Class, Attribute)
attr_importance
min_temp 3.539250e-02
max_temp 0.000000e+00
rainfall 0.000000e+00
evaporation 0.000000e+00
sunshine 6.523179e-02
wind_gust_dir 4.073802e-02
wind_gust_speed 3.931861e-02
wind_dir_9am 3.537000e-02
wind_dir_3pm 1.759904e-02
wind_speed_9am 9.813415e-05
wind_speed_3pm 0.000000e+00
humidity_9am 2.858310e-02
humidity_3pm 6.189702e-02
pressure_9am 5.317622e-02
pressure_3pm 6.878745e-02
cloud_9am 3.314110e-02
cloud_3pm 6.893149e-02
temp_9am 0.000000e+00
temp_3pm 0.000000e+00
rain_today 1.261390e-02
>