|The most immediate findings from a research study are the
results of the data analysis. The results might be a statistically
significant difference between two groups, the portion of a population
that holds a certain view, or a theme that pervades a given social situation.
These results are often accompanied by conclusions, implications, predictions,
and recommendations, all of which are derivative findings.
Often reviews of the literature are focused heavily, if not exclusively, on the results of the individual studies, in an effort to make the most informed inference possible about the nature of some phenomena. For that reason, this lesson will first address integrating results, and then discuss integrating various forms of derivative findings. The integration of results will be discussed separately for quantitative and qualitative studies.
The almost universal challenge when integrating results across a set of social science studies on a given topic is this: the studies vary and the results vary. Even though all the examined studies appear ostensibly to be on the same topic, they always vary at least in some respect to their contexts (social circumstances, economic conditions, etc), conceptual framework, sampled subjects or participants, treatments applied on naturally occurring interventions being studied, and methodology. They also always vary at to some degree in their results, and indeed it is common to have some results that appear to contradict others. In short, consistency is not the norm in social science, and the challenge of the reviewer is to make the best possible inferences from a set of applicable studies despite those inconsistencies.
Fraudulent Integration of Results
The are two widely used fraudulent means of integrating research results. They are used deliberately to misrepresent what the research says on a given subject:
Means for Integrating Results Across Quantitative Studies
The results of studies can be expressed in many metrics-such as mean differences, various types of correlations, and statistical significance. Since it is rare that all research studies use the measuring instruments, one of the prerequisites for integration across studies is that the results have to be express is some common metric. Three common metrics for quantitative research are often used: the direction of the results (positive or negative), whether or not the result is statistically significant, and the effect sizes.
Whether the result is positive or negative is a very crude metric. It ignores the magnitude of the result and also whether a result from a sample could have easy occurred from random sampling error. Despite those limitations, the direction of results is often indicated or implied in research reports, and it can be revealing when examined over a full set of studies.
The statistical significance of a result is a more sophisticated measure, indicating whether a result from a sample is likely to have occurred by chance when the phenomena of interest does not exist in the full population from which the sample was drawn. But statistical significance will occur when the phenomena of interest in the population is weak but the sample size is large (over 1,000), in which case the results are usually trivial. Conversely, if the phenomena in the population is of only modest magnitude, and most of the research studies had small sample sizes (less than 50), only a few of the studies are likely to show statistically significant results. Despite these limitations, the significance levels, when used correctly can be helpful when integrating results across studies.
Effect sizes are a measure of magnitude relative to the variance in the measure. The effect size for the difference between two groups' mean values, the effect size is computed as a ratio of the difference of the two groups' mean values and their pooled variance. In other words:
Mathematically comparable measures can be computed or reasonably closely estimated from several other measures. Effect sizes require some work to compute, and the needed information is sometimes not indicated in research reports, but most experts agree effect size is the best metric for integrating results across quantitative studies.
The following are several means for integrating results across quantitative studies:
Cooper, H. (1998). Synthesizing research (3rd ed.). Thousand Oaks, CA: Sage. The third edition was published in 1998.When summarizing and synthesizing quantitative research studies, researchers often encounter an interesting conundrum: are the sum of results of the studies statistically significant? For example, it is common in a set of, say, 30 studies, based on samples of on some aspect of education or human resource development, to find that only about eight of the results are statistically significant. In that case many neophyte researchers are inclined to conclude the phenomena of interest does not exist. But it is also not uncommon to find that those 30 studies had, say, 24 results in the expected direction (of which eight were statistically significant and 16 were statistically insignificant) and a remaining six results were in the unexpected direction. Compare that distribution with flipping a coin 30 times-it is highly unlikely that heads, or tails, would come up 24 times. Indeed, a "sign test" using the binomial distribution or the large sample normal approximation suggests that this will happen in not more than once in 1,000 trials of flipping a coin 30 times if the phenomena of interest does not exist in the population. Thus in this example we can prudently conclude there was some difference in the population, even though most of the studies did not find statistically significant differences. (The "sign test" is discussed in many introductory statistics textbooks and the binomial distribution is often included in the appended tables.)
This is a simple but profound insight into summarizing results across studies based on samples. Summarizing by calculating the percentage of studies with statistically significant results will often provide misleading inference, suggesting the phenomena of interest does not exist when it actually does. That is particularly so when the phenomena is of only modest strength and when most of the studies have small sample sizes (below 100). Both of those conditions are common in the social sciences. Why does the percentage of statistically significant results provide misleading results? The answer is that statistical significance testing procedures put priority on avoiding inferences that the phenomena exists when it really does not, and the consequence of that priority is to raise the chances of inferring the phenomena does not existwhen it really does exist. While that trade-off is often justified when making inferences from one study, it is not necessary with making inferences across many studies.
Yes, this is complicated! If your training in statistics is limited and you don't understand the above explanation, just remember: you should never summarize results across studies by calculating the percent of results that are statistically significant.
There are several other acceptable options for summarizing results across studies in addition to examining the proportion of positive results with the sign test. Indeed, when the necessary data is available from almost all of the studies, they are superior to the proportion of positive results. The most commonly used are: combining the actual statistical significance levels achieved in every study to simulate an overall significance level, calculating a weighted average of the effect sizes from every study, and calculating a weighted average of the Pearson correlation coefficients for every study. Harris Cooper describes these procedures on pages 120-142 of the above-mentioned book.
All these summary procedures can also be applied after "categorizing" the studies by characteristics. For instance you might group the studies into those that examined younger children and older ones, a short duration treatment and a longer duration of the treatment, standardized achievement tests and performance measures. When doing that, you are seeking to determine whether these differences in the studies help to explain the variations in the results. If so, there will be less variation within each category than for the full set of studies, and some differences between the categories. Looking at the variation of results within and between categories, for one category at a time, is the equivalent of doing several different univariate analyses within a given study. That can be revealing, but it also can be misleading when the study characteristics (which become the independent variables) happen to be correlated.
A superior approach, but applicable only when there are 30 or more results, is to use multiple regression procedures to synthesize the results. This involves using several characteristics of the studies simultaneously to predict and explain the variance in the results. The procedure involves the following steps:
Cooper, H. & Hedges, L.V. (Eds.). (1994). The handbook of research synthesis. New York: Russell Sage Foundation.Means for Integrating Results Across Qualitative Studies
The means for integrating results across qualitative research studies are far less formalized than those for integrating across quantitative studies. There are several reasons for this. Qualitative research itself is rarely designed to make generalizations. Thus, unlike quantitative research, it is more difficult to summarize and synthesize across qualitative studies. The contexts of qualitative research are usually better described than those of quantitative research, but the methodology, other than the general approach, is often not described in any detail; the first makes it easier to categorize the studies but the latter makes it harder. The "results" of qualitative research are rarely stated as parsimoniously as in quantitative research, making it more difficult to portray, summarize, and synthesize them. Finally, considerably less attention has been devoted to integrating across qualitative studies than has been devoted to integrating across quantitative studies. Little work has been devoted to developing procedures for integrating qualitative studies.
Despite all this, all the means cited above for integrating conceptual frameworks, methodologies, and interventions can be applied to the results of a set of qualitative studies on a given topic. The following will explain some ways of doing so:
Noblit, G.W. & Hare, R.D. (1988). Meta-ethnography: Synthesizing qualitative studies. Thousand Oaks, CA: Sage.Means for Integrating Derivative Findings Across Studies
The results of studies are usually reported with accompanying conclusions, implications, and/or recommendations, which have been inferred, at least in part, from the results. Conclusions are summaries of two or more results of a study, or statements about the generalizability of the results. Implications are the larger lessons suggested about the phenomena studied. Recommendations are exhortations to action. All three are usually found in the "Discussion" section of a research article or in the last chapter of a dissertation.
It is not uncommon to find two studies with identical results accompanied by dramatically different implications, and recommendations. The converse is also true, two studies on the same topic with different results might be accompanied by identical implications, and recommendations.
The following are some ways to integrate the conclusions, stated implications, and recommendations from a set of studies on a given topic:
Last Update: June 29, 2000