Make Data Science Easier: What is Data Analysis?
Think about data analysis, often, what may come to mind is the sense of looking at charts, or graphs, or numbers, and really trying to dig into analyze those numbers and figure out what they mean, or at least be able to reproduce them with a model.
Hi! Teacher Bruno here, I would suggest that you might think about it more like a translation process, where you’re taking a lot of data, which may be numbers or whatever, but you’re really trying to extract information from it and communicate it to other humans so that you’re not operating on data in a vacuum. Just like from our Mars data, we’re not just looking at it as numbers, but you’re really trying to say, what can we pull out from this data that would advance science, or medicine, or astronomy, or whatever the domain is?
What is data analysis in research?
Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers for reducing data to a story and interpreting it to derive insights. The data analysis process helps in reducing a large chunk of data into smaller fragments, which makes sense.
Three essential things take place during the data analysis process — the first data organization. Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps in finding patterns and themes in the data for easy identification and linking. Third and the last way is data analysis — researchers do it in both top-down or bottom-up fashion.
Marshall and Rossman, on the other hand, describe data analysis as a messy, ambiguous, and time-consuming, but the creative and fascinating process through which a mass of collected data is being brought to order, structure, and meaning.
We can say that:
“The data analysis and interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”
Why analyze data in research?
Researchers rely heavily on data as they have a story to tell or problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem — we call it ‘Data Mining’ which often reveals some interesting patterns within the data that are worth exploring.
Irrelevant to the type of data, researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is staying open and remaining unbiased towards unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected at the time of initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research.
Types of data in research
Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make them useful. Data can be in different forms; here are the primary data types.
- Qualitative data: When the data presented has words and descriptions, then we call it qualitative data. Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal interviews, or using open-ended questions in surveys. Any data expressed in numbers of numerical figures are called quantitative data. This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. For example: questions such as age, rank, cost, length, weight, scores, etc. all come under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
- Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.
Data analysis in qualitative research
Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis.
Finding patterns in the qualitative data
Although there are several ways to find patterns in textual information, a word-based method is the most relied on and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words.
For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find “food” and “hunger” are the most commonly used words and will highlight them for further analysis.
The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.
For example, researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’
The scrutiny-based technique is also one of the highly recommended text analysis methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other.
For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single answer question types.
Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory. Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.
Methods used for data analysis in qualitative research
There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,
Content Analysis:
It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from physical items. It depends on the research questions to predict when and where to use this method.
Narrative Analysis:
This method is used to analyze content gathered from various sources such as personal interviews, field observation, and surveys. The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
Discourse Analysis:
Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
Grounded Theory:
When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.
Data analysis in quantitative research
Preparing data for analysis
The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below points.
- Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages
- Fraud: To ensure an actual human being records each response to the survey or the questionnaire
- Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
- Procedure: To ensure ethical standards were maintained while collecting the data sample
- Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.
Data Editing
More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.
Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses. If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.
After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical techniques are the most favored to analyze numerical data. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data.
Descriptive statistics
This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that the pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods:
- Measures of Frequency; Count, Percent, Frequency
- It is used to denote home often a particular event occurs.
- Researchers use it when they want to showcase how often a response is given.
- Measures of Central Tendency; Mean, Median, Mode
- The method is widely used to demonstrate distribution by various points.
- Researchers use this method when they want to showcase the most commonly or averagely indicated response.
- Measures of Dispersion or Variation
- Range, Variance, Standard deviation
It is used to identify the spread of scores by stating intervals. Researchers use this method to showcase data spread out. It helps them identify the depth until the data is spread out directly affecting the mean.
Measures of Position
For quantitative market research use of descriptive analysis often give absolute numbers, but the analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on descriptive statistics when the researchers intend to keep the research or outcome limited to the provided sample without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.
Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.
Inferential statistics
Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected sample to reason that about 80–90% of people like the movie. Here are two significant areas of inferential statistics.
Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
Hypothesis test: It’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.
These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables. Here are some of the commonly used methods for data analysis in research.
Correlation:
When researchers are not conducting experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
Cross-tabulation:
Also called contingency tables, cross-tabulation is used to analyze the relationship between multiple variables. Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
Regression analysis:
For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
Frequency tables:
The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
What’s the knowledge that’s in there? And that means that whatever you pull out of your data needs to be communicable to humans. And so I think that should be a guiding directive when you’re
working with large data sets.
That’s it! I hope your reading has been helpful ?? Contact us to learn more!
Wait for continuation, Tks ????