# An Amazingly Elaborate Explanation of Data Analysis Methods

Data analysis is the process of extracting useful information from the given data series, that will be useful in taking important decisions. As the job opportunities for data analysts are on the rise, knowledge of data analysis methods is essential.

Techspirited Staff

Last Updated: Mar 19, 2018

Data analysis methods help us to understand facts, observe patterns, formulate explanations, and try out hypotheses. They are not only used in all kinds of science and business processes, but also in administration and policy-making.

Data analysis can be carried out in all domains, including medicine and social sciences. All the analysis that is carried out is well-documented for future use.

Data analysis can be carried out in all domains, including medicine and social sciences. All the analysis that is carried out is well-documented for future use.

Data Analysis Explained

Data analysis is defined as a practice in which, unorganized or unfinished data is ordered and organized, so that useful information can be highlighted. It involves processing and working on data, in order to understand what all is present in the data and vice-versa.

To understand what is involved in data analysis, take a look at this example:

Between 1800 and 2000, United States' population increased from 5 million to 255 million people, i.e., growth of 250 million. So, these figures illustrate the facts. But, to conclude that the population rose at an average rate of 1.25 million people per year (250 million divided by 200 years), would be wrong. The information would be correct and so would be the arithmetic, but the interpretation, "an average growth rate of 1.25 million people per year", would be dead wrong. The analysis would not correctly interpret facts, as population of the US did not grow in that fashion, not even approximately.

Here's where correct data analysis methods and procedures come into picture. Charts, graphs, and write-ups in text form, are various methods to analyze data. These methods are designed to polish and refine the data, so that the end users can reap interesting or useful information, without any need of going through the entire data themselves.

Between 1800 and 2000, United States' population increased from 5 million to 255 million people, i.e., growth of 250 million. So, these figures illustrate the facts. But, to conclude that the population rose at an average rate of 1.25 million people per year (250 million divided by 200 years), would be wrong. The information would be correct and so would be the arithmetic, but the interpretation, "an average growth rate of 1.25 million people per year", would be dead wrong. The analysis would not correctly interpret facts, as population of the US did not grow in that fashion, not even approximately.

Here's where correct data analysis methods and procedures come into picture. Charts, graphs, and write-ups in text form, are various methods to analyze data. These methods are designed to polish and refine the data, so that the end users can reap interesting or useful information, without any need of going through the entire data themselves.

Qualitative Data Analysis

Qualitative research analysts define 15 types of data analysis methods. Let's go through each one of them:

Typology

It's basically a classification system or methodology, taken from patterns, themes or other kinds of groups of data. This type of method implements the thought that, ideally, categories should be mutually exclusive and exhaustive, if possible. Here's a list of categories as example: acts, activities, meanings, participation, relationships, settings, etc.

Analytic Induction

This is one of the oldest and the most appreciated method. Here, an event is studied and a hypothetical statement is developed of whatever happened. Now, other similar events are studied, and checked if they fit the hypothesis. If they don't, then the hypothesis is revised. This process is started by first looking for exceptions in the derived hypothesis, and then, each of them is revised to suit all examples encountered. Eventually, hypotheses is developed that supports all the observed cases.

Taxonomy

This method is a complex classification containing multiple levels of conceptions or abstractions. Higher levels include lower levels forming superordinate and subordinate categories.

Domain Analysis

This type of analysis is mostly used to describe social and cultural situations, and patterns within it. The method starts by emphasizing what is social situation to participants, while they can interrelate it with cultural meanings.

Constant Comparison/Grounded Theory

This method was developed in the 60s, and has the following steps:

- Look at the document to be analyzed, such as a field note.
- Identify parameters to categorize events and behavior, which will be named and coded on document.
- Code comparison will help find consistencies and deviations.This is done till categories saturate, and no new codes related to it are formed.
- Finally, certain categories become centrally-focused categories, more commonly known as core categories. These core categories are made subjects of case study.

Quasi-statistics

More often than not, enumeration is used in this method to provide manifest for categories formed, or to determine if observations are untrue.

Event Analysis/Microanalysis

In this method, importance is given to finding accurate beginnings and endings of events, by determining specific boundaries or points, that mark boundaries or events. This is the method that is specifically oriented towards film and video making. After end points are determined, repeated viewing can help us find phases in the event.

Metaphorical Analysis

Here, it's required to go on with various metaphors while checking how well they correspond with what is being observed. Participant may be asked for metaphors which they should interpret. For example; "Hallway as a highway." Many participants will take highway and its components in different ways like, students as traffic and teachers as police, etc.

Hermeneutical Analysis

The word 'hermeneutical' literally means, not going for objective meaning of text, but interpreting the text for the people involved in the situation. This is done by never overemphasizing self in an analysis, instead reiterating the people's story. Meaning of any content resides in the author intent, context, and the reader - finding themes and relating these three is involved in this method.

Discourse analysis

This method usually involves video taping of events, so that they can be played over and over again for deeper analysis.

Content Analysis

This method is never used with video, and it is only qualitative in development of categories. Standard rules of categorization in content analysis include:

- A chunk of data to be analyzed at a time (whether it is a line, a sentence, a phrase, a paragraph?) must be identified
- Categories must be inclusive and mutually exclusive
- Should have precisely defined properties
- All data must fit some category, i.e., exhaustive categorization

Phenomenology/Heuristic Analysis

There is emphasis on individual explanation to people. This method emphasizes the effects of research and the researcher's personal experience. The term "phenomenology" is used to describe a researcher's experience.

Narrative Analysis

Also known as 'Discourse analysis', this method gives more importance to interaction. How the narrator chooses to tell frame wise, decides how he/she will be perceived. Always compare ideas while avoiding the revelation of negatives about self. This analysis can involve study of literature, journals or folklore.

Quantitative Data Analysis

For any data analysis, it is necessary to calculate the sample size of the population that is under consideration. The formula to calculate this is as shown above:

where;

N - the population size

e - the margin of error

n - the sample size

where;

N - the population size

e - the margin of error

n - the sample size

Mean

where;

N - the total number of observations

X - Observations

It is nothing but the average of various samples of the population. The value of mean can be obtained by adding all the samples, and then dividing it by the number of observations. Mean highlights the value that is used most often from the given sample data.

N - the total number of observations

X - Observations

It is nothing but the average of various samples of the population. The value of mean can be obtained by adding all the samples, and then dividing it by the number of observations. Mean highlights the value that is used most often from the given sample data.

Median

where;

N - Total number of observations

f

f

Median is the middle value of a series of data taken when the data is arranged in an ascending manner, i.e., from the smallest value to the largest value. It helps to analyze the value that is present in the middle.

N - Total number of observations

f

_{0}- Cumulative frequencyf

_{w}- Frequency of median classMedian is the middle value of a series of data taken when the data is arranged in an ascending manner, i.e., from the smallest value to the largest value. It helps to analyze the value that is present in the middle.

Mode

Mode represents the highest value in a histogram. It is the most important value of the given sample or population. It is the value that is most common in the sample data. This concept is useful when dealing with non-numeric data.

Standard Deviation

where;

x

X - mean value

F

It is the value that gives the amount of deviation from the average value. Upon calculation, if the value of standard deviation is low, then it indicates the proximity of the obtained SD value to the mean value.

x

_{i}- classmarkX - mean value

F

_{i}- FrequencyIt is the value that gives the amount of deviation from the average value. Upon calculation, if the value of standard deviation is low, then it indicates the proximity of the obtained SD value to the mean value.

Variance

where;

x

X - mean value

F

Variance is the value that indicates how scattered each value is from the mean. Variance is the average of the differences between squared means. Standard deviation is the square root of variance.

x

_{i}- classmarkX - mean value

F

_{i}- FrequencyVariance is the value that indicates how scattered each value is from the mean. Variance is the average of the differences between squared means. Standard deviation is the square root of variance.

Range

Range = Highest Value - Smallest Value

The difference between the smallest value and the highest value is known as range. It gives us a clear picture of the vastness of our data. This concept is dependent on the outliners.

The difference between the smallest value and the highest value is known as range. It gives us a clear picture of the vastness of our data. This concept is dependent on the outliners.

Coefficient of Variation

where;

σ - standard deviation

μ - mean value

The dispersion of the values of a data series around the mean value, is known as coefficient of variation. It is also known as utilized risk. The calculation of this value determines the risk involved in investing in any asset.

σ - standard deviation

μ - mean value

The dispersion of the values of a data series around the mean value, is known as coefficient of variation. It is also known as utilized risk. The calculation of this value determines the risk involved in investing in any asset.

Standard Error

where;

s - standard deviation

n - number of observations

It is the measure of the standard deviation of a sampling distribution. It gives the amount of accuracy with which the samples represent the entire population. When the sample mean and actual mean values are different, it is known as standard error.

s - standard deviation

n - number of observations

It is the measure of the standard deviation of a sampling distribution. It gives the amount of accuracy with which the samples represent the entire population. When the sample mean and actual mean values are different, it is known as standard error.

Pearson Product

This value gives the level of linear relationship between two variables. It is named after its developer, Karl Pearson. This concept attempts to draw the best fitting line between the two variables and then measures how far the data values are from this line. The range of the coefficient is between +1 and -1. A value grater than zero indicates positive association and the ones below zero indicate negative association.

Regression Analysis

Y = a+bx

where;

ΣX - Sum of all values of X

ΣY - Sum of all values of Y

ΣXY - Sum of the product of X and Y

ΣX

Regression analysis helps determine the relationship between two variables out of which, one is dependent and the other is independent. The analysis determines the behavior of the dependent variable, when one of the independent variables is varied and the others are kept fixed.

ΣX - Sum of all values of X

ΣY - Sum of all values of Y

ΣXY - Sum of the product of X and Y

ΣX

^{2}- Sum of squared values of XRegression analysis helps determine the relationship between two variables out of which, one is dependent and the other is independent. The analysis determines the behavior of the dependent variable, when one of the independent variables is varied and the others are kept fixed.

Thus, it can be observed that data analysis methods have multiple aspects and approaches, along with diverse techniques and variety of names. It comes to use in different domains like business, science, and social science. This field of statistics is a very complex one, and the number of methods for data analysis aren't quite easy to learn without training and practice under expert guidance.