Data mining with R : learning with case studies by Luis Torgo

By Luis Torgo

"The flexible functions and massive set of add-on programs make R a very good substitute to many latest and infrequently pricey info mining instruments. Exploring this quarter from the viewpoint of a practitioner, information mining with R: studying with case reviews makes use of useful examples to demonstrate the facility of R and knowledge mining. Assuming no past wisdom of R or facts mining/statistical strategies, the e-book covers a

Plot() function, which plots the variable values against the theoretical quantiles of a normal distribution (solid black line). The function also plots an envelope with the 95% confidence interval of the normal distribution (dashed lines). As we can observe, there are several low values of the variable that clearly break the assumptions of a normal distribution with 95% confidence. rm=T parameter setting is used in several functions as a way of indicating that NA values should not be considered in the function calculation.

The dots are the mean value of the frequency of the algal for the different river sizes. Vertical lines represent the 1st quartile, median, and 3rd quartile, in that order. The graphs show us the actual values of the data with small dashes, and the information of the distribution of these values is provided by the quantile plots. 4. For instance, we can confirm our previous observation that smaller rivers have higher frequencies of this alga, but we can also observe that the value of the observed frequencies for these small rivers is much more widespread across the domain of frequencies than for other types of rivers.

Each line of this data frame contains an observation of our dataset. 7 (page 16) we have described alternative ways of extracting particular elements of R objects like data frames. 4 Data Visualization and Summarization Given the lack of further information on the problem domain, it is wise to investigate some of the statistical properties of the data, so as to get a better grasp of the problem. Even if that was not the case, it is always a good idea to start our analysis with some kind of exploratory data analysis similar to the one we will show below.

