Solving Exponential and Logarithmic Equations

Leave a reply

In research, there are many terms that have the same underlying meaning which can be confusing for researchers as they try to complete a project. The problem is that people have different backgrounds and learn different terms during their studies and when they try to work with others there is often confusion over what is what.

In this post, we will try to clarify as much as possible various terms that are used when referring to variables. We will look at the following during this discussion

- Definition of a variable
- Minimum knowledge of the characteristics of a variable in research
- Various synonyms of variable

**Definition **

The word variable has the root of “vary” and the suffix “able”. This literally means that a variable is something that is able to change. Examples include such concepts as height, weigh, salary, etc. All of these concepts change as you gather data from different people. Statistics is primarily about trying to explain and or understand the variability of variables.

However, to make things more confusing there are times in research when a variable dies not change or remains constant. This will be explained in greater detail in a moment.

**Minimum You Need to Know**

Two broad concepts that you need to understand regardless of the specific variable terms you encounter are the following

- Whether the variable(s) are independent or dependent
- Whether the variable(s) are categorical or continuous

When we speak of independent and dependent variables we are looking at the relationship(s) between variables. Dependent variables are explained by independent variables. Therefore, one dimension of variables is understanding how they relate to each other and the most basic way to see this is independent vs dependent.

The second dimension to consider when thinking about variables is how they are measured which is captured with the terms categorical or continuous. A categorical variable has a finite number of values that can be used. Examples in clue gender, hair color, or cellphone brand. A person can only be male or female, have blue or brown eyes, and can only have one brand of cellphone.

Continuous variables are variables that can take on an infinite number of values. Salary, temperature, etc are all continuous in nature. It is possible to limit a continuous variable to categorical variable by creating intervals in which to place values. This is commonly done when creating bins for histograms. In sum, here are the four possible general variable types below

- Independent categorical
- Independent continuous
- Dependent categorical
- Dependent continuous

Natural, most models have one dependent categorical or continuous variable, however you can have any combination of continuous and categorical variables as independents. Remember that all variables have the above characteristics despite whatever terms is used for them.

**Variable Synonyms**

Below is a list of various names that variables go by in different disciplines. This is by no means an exhaustive list.

*Experimental variable*

A variable whose values are independent of any changes in the values of other variables. In other words, an experimental variable is just another term for independent variable.

*Manipulated Variable*

A** **variable that is independent in an experiment but whose value/behavior the researcher is able to control or manipulate. This is also another term for an independent variable.

*Control Variable*

A variable whose value does not change. Controlling a variable helps to explain the relationship between the independent and dependent variable in an experiment by making sure the control variable has not influenced in the model

*Responding Variable*

The dependent variable in an experiment. It responds to the experimental variable.

*Intervening Variable*

This is a hypothetical variable. It is used to explain the causal links between variables. Since they are hypothetical, they are observed in an actual experiment. For example, if you are looking at a strong relationship between income and life expectancy and find a positive relationship. The intervening variable for this may be access to healthcare. People who make more money have more access to health care and this contributes to them often living longer.

*Mediating Variable*

This is the same thing as an intervening variable. The difference being often that the mediating variable is not always hypothetical in nature and is often measured it’s self.

*Confounding Variable*

A confounder is a variable that influences both the independent and dependent variable, causing a spurious or false association. Often a confounding variable is a causal idea and cannot be described in terms of correlations or associations with other variables. In other words, it is often the same thing as an intervening variable.

*Explanatory Variable*

This variable is the same as an independent variable. The difference being that an independent variable is not influenced by any other variables. However, when independence is not for sure, than the variable is called an explanatory variable.

*Predictor Variable*

A predictor variable is an independent variable. This term is commonly used for regression analysis.

*Outcome Variable*

An outcome variable is a dependent variable in the context of regression analysis.

*Observed Variable*

This is a variable that is measured directly. An example would be gender or height. There is no psychology construct to infer the meaning of such variables.

*Unobserved Variable*

Unobserved variables are constructs that cannot be measured directly. In such situations, observe variables are used to try to determine the characteristic of the unobserved variable. For example, it is hard to measure addiction directly. Instead, other things will be measure to infer addiction such as health, drug use, performance, etc. The measures of this observed variables will indicate the level of the unobserved variable of addiction

*Features*

A feature is an independent variable in the context of machine learning and data science.

*Target Variable*

A target variable is the dependent variable in the context f machine learning and data science.

To conclude this, below is a summary of the different variables discussed and whether they are independent, dependent, or neither.

Independent | Dependent | Neither |
---|---|---|

Experimental | Responding | Control |

Manipulated | Target | Explanatory |

Predictor | Outcome | Intervening |

Feature | Mediating | |

Observed | ||

Unobserved | ||

Confounding |

You can see how confusing this can be. Even though variables are mostly independent or dependent, there is a class of variables that do not fall into either category. However, for most purposes, the first to columns cover the majority of needs in simple research.

**Conclusion**

The confusion over variables is mainly due to an inconsistency in terms across variables. There is nothing right or wrong about the different terms. They all developed in different places to address the same common problem. However, for students or those new to research, this can be confusing and this post hopefully helps to clarify this.

It is common in research to want to visualize data in order to search for patterns. When the number of features increases, this can often become even more important. Common tools for visualizing numerous features include principal component analysis and linear discriminant analysis. Not only do these tools work for visualization they can also be beneficial in dimension reduction.

However, the available tools for us are not limited to these two options. Another option for achieving either of these goals is t-Distributed Stochastic Embedding. This relative young algorithm (2008) is the focus of the post. We will explain what it is and provide an example using a simple dataset from the Ecdat package in R.

t-sne is a nonlinear dimension reduction visualization tool. Essentially what it does is identify observed clusters. However, it is not a clustering algorithm because it reduces the dimensions (normally to 2) for visualizing. This means that the input features are not longer present in their original form and this limits the ability to make inference. Therefore, t-sne is often used for exploratory purposes.

T-sne non-linear characteristic is what makes it often appear to be superior to PCA, which is linear. Without getting too technical t-sne takes simultaneously a global and local approach to mapping points while PCA can only use a global approach.

The downside to t-sne approach is that it requires a large amount of calculations. The calculations are often pairwise comparisons which can grow exponential in large datasets.

We will use the “Rtsne” package for the analysis, and we will use the “Fair” dataset from the “Ecdat” package. The “Fair” dataset is data collected from people who had cheated on their spouse. We want to see if we can find patterns among the unfaithful people based on their occupation. Below is some initial code.

```
library(Rtsne)
library(Ecdat)
```

To prepare the data, we first remove in rows with missing data using the “na.omit” function. This is saved in a new object called “train”. Next, we change or outcome variable into a factor variable. The categories range from 1 to 9

- Farm laborer, day laborer,
- Unskilled worker, service worker,
- Machine operator, semiskilled worker,
- Skilled manual worker, craftsman, police,
- Clerical/sales, small farm owner,
- Technician, semiprofessional, supervisor,
- Small business owner, farm owner, teacher,
- Mid-level manager or professional,
- Senior manager or professional.

Below is the code.

```
train<-na.omit(Fair)
train$occupation<-as.factor(train$occupation)
```

Before we do the analysis we need to set the colors for the different categories. This is done with the code below.

```
colors<-rainbow(length(unique(train$occupation)))
names(colors)<-unique(train$occupation)
```

We can now do are analysis. We will use the “Rtsne” function. When you input the dataset you must exclude the dependent variable as well as any other factor variables. You also set the dimensions and the perplexity. Perplexity determines how many neighbors are used to determine the location of the datapoint after the calculations. Verbose just provides information during the calculation. This is useful if you want to know what progress is being made. max_iter is the number of iterations to take to complete the analysis and check_duplicates checks for duplicates which could be a problem in the analysis. Below is the code.

`tsne<-Rtsne(train[,-c(1,4,7)],dims=2,perplexity=30,verbose=T,max_iter=1500,check_duplicates=F)`

```
## Performing PCA
## Read the 601 x 6 data matrix successfully!
## OpenMP is working. 1 threads.
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Building tree...
## Done in 0.05 seconds (sparsity = 0.190597)!
## Learning embedding...
## Iteration 1450: error is 0.280471 (50 iterations in 0.07 seconds)
## Iteration 1500: error is 0.279962 (50 iterations in 0.07 seconds)
## Fitting performed in 2.21 seconds.
```

Below is the code for making the visual.

```
plot(tsne$Y,t='n',main='tsne',xlim=c(-30,30),ylim=c(-30,30))
text(tsne$Y,labels=train$occupation,col = colors[train$occupation])
legend(25,5,legend=unique(train$occupation),col = colors,,pch=c(1))
```

You can see that there are clusters however, the clusters are all mixed with the different occupations. What this indicates is that the features we used to make the two dimensions do not discriminant between the different occupations.

**Conclusion**

T-SNE is an improved way to visualize data. This is not to say that there is no place for PCA anymore. Rather, this newer approach provides a different way of quickly visualizing complex data without the limitations of PCA.

%d bloggers like this: