# Blog Archives

## Solutions for Multicollinearity in Regression(2)

February 16, 2014
By

Continue to discuss this topic about multicollinearity in regression. Firstly, it is necessary introduce how to calculate the VIF and condition number via software such as R. Of course it is really easy for us. The vif() in car and kappa() can be applied to calculate the VIF and condition number, respectively. Consider the data from … Continue reading...

## Plot 3D Topographic Map in R

February 7, 2014
By

As we all know, there are a lot of packages provide functions to plot maps, such as ggmap, GEOmap, rworldmap and so on. For visualizing 2D topographic map, here is a good example. Besides, 3D topographic map is also easily to be plotted via some excellent functions and packages. The implement methods include but are not limited to … Continue reading...

## Solutions for Multicollinearity in Regression(1)

February 3, 2014
By

In multiple regression analysis, multicollinearity is a common phenomenon, in which two or more predictor variables are highly correlated. If there is an exact linear relationship (perfect multicollinearity) among the independent variables, the rank of X is less than k+1(assume the number of predictor variables is k), and the matrix will not be invertible. So the strong correlations … Continue reading...

## Visualization of AQI

February 2, 2014
By

The day before yesterday is spring festival which is one of the most famous Chinese festivals, and setting off firecrackers outside on New Year Eve is a traditional custom. However, firecrackers will pollute circumstance severely and cause the hazy weather. Of course the pollution of different province is not the same, and through charts we can view the pollution distribution directly. We … Continue reading...

## Playing Financial Data Series(1)

January 24, 2014
By

These days I became interested in financial data, such as stock price, exchange rate and so on. Obviously there are a lot of available models to fit, analyze and predict these types of data. For instance, basic time series model arima(p,d,q), Garch model, and multivariate time series model such  as VARX model, state space models. … Continue reading...

## The number of clusters in Hierarchical Clustering

January 22, 2014
By

Cluster analysis is widely applied in data analysis. Obviously hierarchical clustering is the simple and important method to do clustering. In brief, hierarchical clustering methods use the elements of a proximity matrix to generate a tree diagram or dendogram. From the tree diagram, we can draw our own conclusions about the results of clustering. However, when … Continue reading...

## Happy new year

December 29, 2013
By

Although 2013 was not perfect for me, it still gave me a lot of happiness and beneficial experiences which were worthy to recall.  It is in 2014 that numerous difficult problems need to be solved. Application is still a headache and the final tests are also troublesome. Whereas,  2014 is full of hope. I have … Continue reading...

## High frequency words in TOEFL

December 27, 2013
By

In general, TOEFL(Test of English as a Foreign Language) is not an easy test for Chinese students, including me.  Relatively speaking, the reading section is little easier than the other sections (listening, speaking, writing). Interestingly, when I prepared my TOEFL test, I found that some important words appeared frequently in the mock examination. So I did a … Continue reading...

## Merry Christmas

December 25, 2013
By

Merry Christmas. I am sorry to update this blog so late. Actually I was extremely busy in recent months because of my Ph.D application. Here is a simple Christmas Card powered by R to express my best wishes to you.  Sincerely hope all your Christmas dreams come true! View card. By the way, if you are interested … Continue reading...

## PCA or SPCA or NSPCA?

November 15, 2013
By

Principal component analysis(PCA) is one of the classical methods in multivariate statistics. In addition, it is now widely used as a way to implement data-processing and dimension-reduction. Besides statistics, there are numerous applications about PCA in engineering, biology, and so on. There are two main optimal properties of PCA,  which are guaranteeing minimal information loss and uncorrelated principal components. That's … Continue reading...