'programing/R studio' 카테고리의 글 목록 (2 Page)

programing/R studio

의사결정나무/앙상블분석/로지스틱회귀분석 2018.07.30
주성분분석 사례 2018.07.29
계량적MDS와 비계량적MDS 예시 2018.07.29
시계열 분석 예제 2018.07.29
회귀분석 - lm 2018.07.02
기하분포 - dgeom 2018.07.02
ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬 2018.06.29
createDataPartition() - 층화 균등 추출 2018.06.26
dcast() 2018.06.26
정규화란? 2018.06.25

의사결정나무/앙상블분석/로지스틱회귀분석

2018. 7. 30. 21:44

##### 의사결정나무

library(tree)

iris.tr<-tree(Species~., iris)

plot(iris.tr)

text(iris.tr)

# 데이터를 7:3으로 분리

library(party)

idx<-sample(2, nrow(iris), replace=T, prob=c(0.7, 0.3))

train.data<-iris[idx==2,]

test.data<-iris[idx==1,]

iris.tree<-ctree(Species~., data=train.data)

plot(iris.tree)

plot(iris.tree, type="simple")

# 예측된 데이터와 실제 데이터 비교

table(predict(iris.tree), train.data$Species)

# test data를 적용하여 적확성 확인

test.pre<-predict(iris.tree, newdata=test.data)

table(test.pre, test.data$Species)

##### 앙상블 분석

# 랜덤포레스트

library(randomForest)

idx<-sample(2, nrow(iris), replace=T, prob=c(0.7, 0.3))

train.data<-iris[idx==2,]

test.data<-iris[idx==1,]

r.f<-randomForest(Species~., data=train.data, ntree=100, proximity=T)

table(predict(r.f), train.data$Species)

plot(r.f)

varImpPlot(r.f)

# test data 예측

pre.rf<-predict(r.f, newdata=test.data)

table(pre.rf, test.data$Species)

plot(margin(r.f, test.data$Species))

# 성과분석

library(rpart)

library(party)

library(ROCR)

x<-kyphosis[sample(1:nrow(kyphosis), nrow(kyphosis), replace=F),]

x.train<-kyphosis[1:floor(nrow(x)*0.75),]

x.evaluate<-kyphosis[floor(nrow(x)*0.75):nrow(x),]

x.model<-cforest(Kyphosis~Age+Number+Start, data=x.train)

x.evaluate$prediction<-predict(x.model, newdata=x.evaluate)

x.evaluate$correct<-x.evaluate$prediction == x.evaluate$Kyphosis

print(paste("% of predicted classification correct"), mean(x.evaluate$correct))

x.evaluate$probabilities<-1-unlist(treeresponse(x.model, newdata=x.evaluate),

use.names=F)[seq(1, nrow(x.evaluate)*2, 2)]

pred<-prediction(x.evaluate$probabilities, x.evaluate$Kyphosis)

perf<-performance(pred, "tpr", "fpr")

plot(perf, main="ROC curve", colorize=T)

perf<-performance(pred, "lift", "rpp")

plot(perf, main="lift curve", colorize=T)

##### 로지스틱 회귀분석

b<-glm(Species~Sepal.Length, data=a, family=binomial)

summary(b)

'programing > R studio' 카테고리의 다른 글

merge 함수의 all.x = TRUE (0)	2018.09.10
R - 기본함수 - paste / paste0 (0)	2018.08.22
주성분분석 사례 (0)	2018.07.29
계량적MDS와 비계량적MDS 예시 (0)	2018.07.29
시계열 분석 예제 (0)	2018.07.29

주성분분석 사례

2018. 7. 29. 17:05

##### 주성분분석 사례

library(datasets)

data(USArrests)

# 두 개 이상의 변수에 대해 모든 가능한 산점도를 그림

pairs(USArrests, panel=panel.smooth, main="USArrests data")

US.prin<-princomp(USArrests, cor=T)

summary(US.prin)

# 주성분들에 의해 설명되는 변동의 비율

screeplot(US.prin, npcs=4, type="lines")

# 네 개의 변수가 각각 주성분 comp.1~4까지 기여하는 가중치

loadings(US.prin)

# comp.1~4의 선형식을 통해 각 지역별로 얻은 결과를 계산

US.prin$scores

biplot(US.prin)

'programing > R studio' 카테고리의 다른 글

R - 기본함수 - paste / paste0 (0)	2018.08.22
의사결정나무/앙상블분석/로지스틱회귀분석 (0)	2018.07.30
계량적MDS와 비계량적MDS 예시 (0)	2018.07.29
시계열 분석 예제 (0)	2018.07.29
회귀분석 - lm (0)	2018.07.02

계량적MDS와 비계량적MDS 예시

2018. 7. 29. 15:44

##### 계량적MDS : cmdscale사례

library(MASS)

head(eurodist)

str(eurodist)

eurodist

# cmdscale() : 2차원으로 21개 도시들을 매핑

loc<-cmdscale(eurodist)

x<-loc[,1]

y<- -loc[,2] # 북쪽도시를 상단에 표시하기 위해 부호를 바꾼 것

plot(x, y, type="n", asp=1, main="Metric MDS") # asp=x축의 단위

text(x, y, rownames(loc), cex=0.7)

abline(v=0, h=0, lty=2, lwd=0.5) # v=x축; h=y축; lty=선타입; lwd=선굵기

##### 비계량적MDS : isoMDS와 sammon사례

library(MASS)

data(swiss)

swiss

head(swiss)

# iosMDS

swiss.x<-as.matrix(swiss[,-1])

swiss.dist<-dist(swiss.x) # 열과 데이터매트릭스간의 거리

swiss.mds<-isoMDS(swiss.dist)

plot(swiss.mds$points, type="n")

text(swiss.mds$points, labels=as.character(1:nrow(swiss.x)))

abline(v=0, h=0, lty=2, lwd=0.5)

?isoMDS

# sammon

swiss.x<-as.matrix(swiss[,-1])

swiss.sammon<-sammon(dist(swiss.x))

plot(swiss.sammon$points, type="n")

text(swiss.sammon$points, labels=as.character(1:nrow(swiss.x)))

abline(v=0, h=0, lty=2, lwd=0.5)

'programing > R studio' 카테고리의 다른 글

의사결정나무/앙상블분석/로지스틱회귀분석 (0)	2018.07.30
주성분분석 사례 (0)	2018.07.29
시계열 분석 예제 (0)	2018.07.29
회귀분석 - lm (0)	2018.07.02
기하분포 - dgeom (0)	2018.07.02

시계열 분석 예제

2018. 7. 29. 14:18

library(tseries)

library(forecast)

library(TTR)

king<-scan("http://robjhyndman.com/tsdldata/misc/kings.dat", skip=3)

king

# ts = time series = 시계열

# ts(데이터) = 시계열 객체 생성하기

# start=첫 번째 관찰 타임; end=마지막 관찰 타임; frequency=세부적인 관찰 주기

king.ts<-ts(king)

# plot.ts(시계열 데이터) : 시계열 데이터를 그래프로 표현

plot.ts(king.ts)

# 3년마다 평균을 내서 그래프를 부드럽게 표현

# SMA() : 지난 n 회의 관측치에 대한 시계열의 산술 평균을 계산

king.sma3<-SMA(king.ts, n=3)

plot.ts(king.sma3)

# 8년마다 평균을 내서 그래프를 부드럽게 표현

king.sma8<-SMA(king.ts, n=8)

plot.ts(king.sma8)

# ARIMA모델

# 1차 차분(현시점자료-전시전자료)

king.ff1<-diff(king.ts, differences=1)

plot.ts(king.ff1)

# ACF와 PACF를 통해 적합하 ARIMA모델을 결정

acf(king.ff1, lag.max=20)

acf(king.ff1, lag.max=20, plot=F)

pacf(king.ff1, lag.max=20)

pacf(king.ff1, lag.max=20, plot=F)

# auto.arima() : 자동으로 적합한 arima모델을 찾아줌

auto.arima(king)

king.arima<-arima(king.ts, order=c(0,1,1))

king.arima

# forecast() : 미래 예측

# 책에는 forecast.Arima()로 나오는데, 그건 작동이 안 됨

king.forecasts<-forecast(king.arima, h=5)

king.forecasts

plot(king.forecasts)

'programing > R studio' 카테고리의 다른 글

주성분분석 사례 (0)	2018.07.29
계량적MDS와 비계량적MDS 예시 (0)	2018.07.29
회귀분석 - lm (0)	2018.07.02
기하분포 - dgeom (0)	2018.07.02
ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬 (0)	2018.06.29

회귀분석 - lm

2018. 7. 2. 17:48

lm(mpg~hp, data=mtcars)

# y = ax + b

# intercept가 a

# hp가 b

# 즉, y = 30.09 - 0.06x

DF<-data.frame(Work_hour=1:7, Total_pay=seq(10000, 70000, by=10000))

# Work_hour = X 변수 ; Total_pay = Y 변수

plot(Total_pay ~ Work_hour, data=DF, pch=20, col="red") # pch는 점 모양

grid()

LR<-lm(Total_pay ~ Work_hour, data=DF) # 종속~독립변수

mode(LR)

names(LR)

grid()

abline(LR, col="blue", lwd=2) # lwd는 선 굵기

'programing > R studio' 카테고리의 다른 글

계량적MDS와 비계량적MDS 예시 (0)	2018.07.29
시계열 분석 예제 (0)	2018.07.29
기하분포 - dgeom (0)	2018.07.02
ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬 (0)	2018.06.29
createDataPartition() - 층화 균등 추출 (0)	2018.06.26

기하분포 - dgeom

2018. 7. 2. 17:16

# 기하분포

# 20%의 성공확률이 5번 째에 성공할 확률

a<-dgeom(1:5, 0.2)

# 20%의 성공확률을 5번 시행하는 동안에 성공할 확률(누적)

sum(a[1:5])

# X가 0에서 시작할 때 -> X번 실패하고 처음으로 성공할 확률

# X가 1에서 시작할 때 -> X번 째에 성공할 확률

'programing > R studio' 카테고리의 다른 글

시계열 분석 예제 (0)	2018.07.29
회귀분석 - lm (0)	2018.07.02
ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬 (0)	2018.06.29
createDataPartition() - 층화 균등 추출 (0)	2018.06.26
dcast() (0)	2018.06.26

ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬

2018. 6. 29. 15:01

scale_x_discrete, scale_y_discrete - x, y값 순서 정렬

ex)

ggplot() + ... +

scale_x_discrete(limit=c("a", "b", "c")

'programing > R studio' 카테고리의 다른 글

회귀분석 - lm (0)	2018.07.02
기하분포 - dgeom (0)	2018.07.02
createDataPartition() - 층화 균등 추출 (0)	2018.06.26
dcast() (0)	2018.06.26
정규화란? (0)	2018.06.25

createDataPartition() - 층화 균등 추출

2018. 6. 26. 16:26

library(caret)

createDataPartition()

: 층화 균등 추출에 사용되는 함수이다. 계층 별로 비율을 동일하게, 랜덤하게 뽑아준다.

1) createDataPartition(y, times = 1, p = 0.5, list = TRUE, groups = min(5, length(y)))

- y : 대상 vector

- p : 선택할 데이터 확률

2) 사용예

- df <- data.frame(replicate(10,sample(1:3, 20,rep=TRUE)))

- createDataPartition(y = df$X1, p = 0.7, list = FALSE, groups = min(2, length(df$X1)))

[출처] R 표본 샘플링 - 홀드아웃 holdout 방법|작성자 나리

https://blog.naver.com/nyaminyam/221246413590

[출처] sample, createDataPartition 사용법|작성자 이해할때까지

https://blog.naver.com/wujuchoi/221058021095

'programing > R studio' 카테고리의 다른 글

기하분포 - dgeom (0)	2018.07.02
ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬 (0)	2018.06.29
dcast() (0)	2018.06.26
정규화란? (0)	2018.06.25
선형회귀 lm(), predict(), abline(), coef() (0)	2018.06.24

dcast()

2018. 6. 26. 14:24

> library(reshape2)

> fm<-melt(id=1:4, french_fries)

> head(fm)

  time treatment subject rep variable value

1    1         1       3   1   potato   2.9

2    1         1       3   2   potato  14.0

3    1         1      10   1   potato  11.0

4    1         1      10   2   potato   9.9

5    1         1      15   1   potato   1.2

6    1         1      15   2   potato   8.8

> # dcast()는 melt 전 상태로 돌려주는 듯!

> # ...  => 나열 되지 않은 나머지 모든 변수

> x<-dcast(fm, time+treatment+subject+rep~variable)

> head(x)

  time treatment subject rep potato buttery grassy rancid painty

1    1         1       3   1    2.9     0.0    0.0    0.0    5.5

2    1         1       3   2   14.0     0.0    0.0    1.1    0.0

3    1         1      10   1   11.0     6.4    0.0    0.0    0.0

4    1         1      10   2    9.9     5.9    2.9    2.2    0.0

5    1         1      15   1    1.2     0.1    0.0    1.1    5.1

6    1         1      15   2    8.8     3.0    3.6    1.5    2.3

> x<-dcast(fm, time+treatment+subject+rep~...)

> head(x)

  time treatment subject rep potato buttery grassy rancid painty

1    1         1       3   1    2.9     0.0    0.0    0.0    5.5

2    1         1       3   2   14.0     0.0    0.0    1.1    0.0

3    1         1      10   1   11.0     6.4    0.0    0.0    0.0

4    1         1      10   2    9.9     5.9    2.9    2.2    0.0

5    1         1      15   1    1.2     0.1    0.0    1.1    5.1

6    1         1      15   2    8.8     3.0    3.6    1.5    2.3

> head(fm)

  time treatment subject rep variable value

1    1         1       3   1   potato   2.9

2    1         1       3   2   potato  14.0

3    1         1      10   1   potato  11.0

4    1         1      10   2   potato   9.9

5    1         1      15   1   potato   1.2

6    1         1      15   2   potato   8.8

> dcast(fm, time~variable) # 1행만 id 지정; 72는 행의 수

Aggregation function missing: defaulting to length

   time potato buttery grassy rancid painty

1     1     72      72     72     72     72

2     2     72      72     72     72     72

3     3     72      72     72     72     72

4     4     72      72     72     72     72

5     5     72      72     72     72     72

6     6     72      72     72     72     72

7     7     72      72     72     72     72

8     8     72      72     72     72     72

9     9     60      60     60     60     60

10   10     60      60     60     60     60

> nrow(filter(fm, time==1 & variable == "potato"))

[1] 72

library(reshape2)

fm<-melt(id=1:4, french_fries)

head(fm)

# dcast()는 melt 전 상태로 돌려주는 듯!

# ... => 나열 되지 않은 나머지 모든 변수

x<-dcast(fm, time+treatment+subject+rep~variable)

head(x)

x<-dcast(fm, time+treatment+subject+rep~...)

head(x)

head(fm)

dcast(fm, time~variable) # 1행만 id 지정; 72는 행의 수

nrow(filter(fm, time==1 & variable == "potato"))

'programing > R studio' 카테고리의 다른 글

ggplot2 - scale_x_discrete, scale_y_discrete - x, y값 순서 정렬 (0)	2018.06.29
createDataPartition() - 층화 균등 추출 (0)	2018.06.26
정규화란? (0)	2018.06.25
선형회귀 lm(), predict(), abline(), coef() (0)	2018.06.24
par(mar=c(1,1,1,1)) - 워드클라우드 글씨 안 잘리고 다 나오도록 하기 (0)	2018.06.22

정규화란?

2018. 6. 25. 17:41

데이터의 범위를 일치시키거나 분포를 유사하게 만들어 주는 등의 작업

'programing > R studio' 카테고리의 다른 글

createDataPartition() - 층화 균등 추출 (0)	2018.06.26
dcast() (0)	2018.06.26
선형회귀 lm(), predict(), abline(), coef() (0)	2018.06.24
par(mar=c(1,1,1,1)) - 워드클라우드 글씨 안 잘리고 다 나오도록 하기 (0)	2018.06.22
forceNetwork - bounded, linkColour (0)	2018.06.22

PREV 1 2 3 4 5 ···12 NEXT

h-elena