Lesson A-6

Assessing the Methodology
of the Study

There are four main aspects of the research methodology: design, sampling, data collection, the data analysis.  If inappropriate methodology is used, or if appropriate methodology is used poorly, the results of a study could be misleading. 


Research design specifies what group(s) for which data will be collected, to which group(s) and when the intervention will occur, and when the data will collected from each group.  The strength of a design, and the possible biases inherent in a design, depend on the type of questions being addressed in the research.

  • Descriptive and associational questions need designs that only specify when the data will be collected from one group of interest.
  • Causal questions are usually answered by designs where:
    • the intervention is applied to one group of two comparable groups, and measures of expected outcomes are made for both groups at the end of the intervention; 
    • repeated measures of the outcomes of interest are made several times before and after the intervention is applied to one group; or, 
    • for a group in which some units have received the intervention (perhaps varying levels of it) and some units have not, data is collected at one point in time on suspected causal variables, on the outcome variables of interest, and on various other variables that might have affected the outcomes of interest, and then the data are analyzed to determine whether the suspected causal elements actually had an effect on the outcomes.
  • Benefit-cost analyses require the designs for causal questions plus collection of data that permits calculations of the value of the benefits as well as the costs.
It should be noted that “experimental designs” are sometimes alleged to be the “gold standard” in the social sciences.  This is nonsense.  Experimental designs are not needed to answer descriptive and associational questions, and they can do only part of what is needed in cost-benefit analyses.  Their potential strength is only in answering causal questions, and their power for that is easily compromised when researching complex educational or workplace innovations.  The gold standard in medicine is the “double-blind, placebo-controlled experiment” that is commonly used to test new medications (but not to test new surgical procedures-for reasons that should soon be apparent).  Subjects are assigned at random to treatment or control, half are given the new pill and half are given a similar pill that is inert.  The people providing the pills and instructions to the patients don’t know whether they are handing over the medicine or placebo (they are “blind to that”).  Likewise, the people who later measure the potential impacts on the medical condition and the possible side effects, don’t know which patients actually received the medication and which did not. 

These conditions rarely can prevail when testing complex educational or workplace innovations.  It is rare that a convincing placebo can be concocted and administered.  It is often difficult to prevent some spill-overs of the treatment--whereby those receiving the treatment share it with some friends who are in the control group.  It is also difficult to prevent those not receiving the treatment from seeking alternative treatments on their own.  It is rare that those administering the innovation can do so without knowing they are using the “treatment” rather than the “placebo.”  It is also rare that those measuring the effects are “blind” about who did and did not receive the treatment, although this is sometimes feasible to arrange.  This is not to say that experimental designs are a waste of time in answering causal questions in education and worksite research.  Sometimes they are the best option, but rarely are they “golden.”

In qualitative research, often the specific questions of interest emerge in the course of the study and thus the design for answering them must also emerge.  While the designs described above tend to be explicitly discussed in quantitative research, they can be applicable to qualitative research.  For instance, if the main question is what are recent Central American immigrant youth’s perceptions of DC school life, a phenomenologist could intensively study the perceptions several such youth already in one or more DC schools.  If the main question is whether Math Explosion software can boost these youths’ math skills, an ethnographer would have a stronger basis for answering the question if he or she intensively studied these youths’ application of math in school and outside, for awhile before the youth start using the software, during the use, and then afterwards.


Sometimes a study involves the entire population of interest, but more often it involves only a small portion of the students, employees, families, schools, communities, or other “units of analysis.” Sampling serves three purposes:

  • It reduces the costs and time required to do the research;
  • It often improves the quality of information by allowing more intensive data collection than would otherwise be possible; and,
  • it reduces the burden on respondents. 
There are four main steps to sampling that are important to the interpretation of the results.  There may be weaknesses in one or more of the steps.  The terminology and procedures of sampling differ some between quantitative and qualitative research, and the quantitative framework is used immediately below.  The phases are:
  1. Specification of a “population” (or “universe”) to which you wish to generalize.  One cannot properly make inferences beyond the population that was sampled.

  2. Identification of a sampling frame of the population which lists all the persons, families, etc. in the desired population.  Often no perfect frame exists, and available or compiled lists include some people not in the population, and perhaps some people are listed more than once.

  3. Drawing the sample.  Quantitative research using inferential statistics requires random sampling; qualitative research usually uses non-random procedures.

  4. Securing the needed data from the sample.  Usually not all people included in a sample can be contacted and are willing to participate in the data collection.  Some that do participate will fail to provide some of the needed data, either because they do not know the information or they do not want to divulge it.  Response rates in surveys and long-term follow-ups of experiments are often very low (15-30 percent), and often it is difficult to ascertain whether they are representative of the other 70-85 percent of the people. 
Most quantitative research in education, human development, and human resource development falls short of what is needed for a solid sample.  Most do not sample randomly from a frame that closely coincides with the population of interest, but rather "conveniently" select several schools, homes, or worksites that are located near the researcher and agree to participate.  For long interventions and long-term follow-ups, some data is often missing for a substantial percentage of the sample.  To prevent these shortcomings usually would greatly increase the cost of the study.  Although one can never know with certainty, sometimes post-hoc analyses comparing characteristics of the sampled units with the population, and characteristics of the respondents with the initial sample, can suggest that one or both are representative.  Without such evidence,caution should caution be used in generalizing the results beyond the cases actually studied. 

Qualitative research, and some quantitative research, use non-random samples.  Non-random samples include quota samples in which the researcher selects participants in proportion to one or more characteristics of the population.  Typical case samples are drawn to represent the median characteristics of the population.  Critical cases are drawn to represent certain subgroups of the population that are of particular interest. All these have merit.

The sampling done in qualitative research, however, is often problematic for at least two reasons.  First, the researcher may consciously or subconsciously draw cases partly for reasons other than the stated one.  For instance, an ethnographer investigating suspected adverse effects of state education reforms on minority youth, may select the few classrooms for intense observation not only to be “typical” but also partly because they are known not to be handling the reforms well.  Second, qualitative researchers often don’t explain how they selected the people to observe or to interview, and they rarely tell you what portion of those initially selected refused to cooperate.  Consequently, it often is difficult to judge the adequacy of sampling in qualitative research. 

Data Collection

Quantitative researchers develop most of their questions and hypotheses very specifically before the study, and then find or develop instruments for collecting the data.  That gives them opportunity to refine each item, but no opportunity to address new questions that may arise from the early data collection.  Qualitative researchers usually start with a qualitative research methodology (such as historiography, ethnography, phenomenology) and often an interpretive paradigm, and then collect data intensively by observation and unstructured interviews.  That allows them to use early findings to generate new questions that they examine in the later stages of data collection, but they often have to focus their observations and develop their interview questions on the fly without any opportunity to refine them.

The means of data collection in social science are diverse.  For instance, one can observe and code or note, administer tests of skills, administer various personality and attitude inventories, interview people in person or by phone, mail out questionnaires, content-analyze transcripts of dialogue, and review official documents.

There are two key elements of data collection in quantitative research: the instruments and the data collection procedures.  The term “instruments” in the social sciences usually refers to written forms on which the researchers or the people being studied record information.  Mechanical and electrical measure are also occasionally used. 

Two concepts are central to quantitative measurement: reliability and validity.  Reliability means the instrument consistently gives the same value when measuring any given level of a phenomena.  Validity means that the value yielded by the instrument is accurate.  Reliability is necessary but not sufficient for valid measurement.  For instance, careful use of a ruler will allow measurements accurate within about 1/16 of an inch, but the measurements will not be accurate if the user unknowingly has a ruler that has shrunk one inch.  Some measures in quantitative social science have credible evidence of their reliability and validity, but most do not and thus must be judged on whatever is apparent from reviewing them.  Do the instruments seem to cover all the important issues?  Is there balance or do most of the items address only strengths or weaknesses?  Is a wide range of responses, ratings, scores, etc. possible?  Are the instruments easy to use correctly?  Were new instruments developed specifically for the study pilot tested?  Who collected the data, with what advance training, and what introductions and instructions provided to the participants, and with what monitoring of the data collection?

Qualitative research relies much less on instruments, making the procedures all-important.  The data collection is usually done by doctoral students or scholars, rather than delegated to people with lesser research experience-which is often done in quantitative research.  Qualitative research reports usually provide a very general idea of how the data was collected but provide few specifics.  These reports rarely indicate what questions were posed in the interviews-indeed the questions often vary from one interviewee to the other, making a report of the questions impractical.  The reports also rarely indicate what potentially important events were not observed because of various constraints.  Often the only practical way to assess qualitative research data collection is to check whether the investigator at least sought data to challenge or verify his or her early results.

Virtually all data collection methods have their shortcomings and potential biases. Experienced researchers, both quantitative and qualitative, know it is best to try to measure the most important variables with multiple items and/or multiple means, and then compare the results. 

Data Analysis

In quantitative research, well established statistical procedures are usually used.  The appropriateness of the selected procedures can be judged by two criteria. The first is whether the design and data meet the assumptions of the procedure.  Some of the more common assumptions are in respect to the level of measurement (nominal ordinal, interval and ratio), normality of distributions (for parametric statistics), and homogeneity of variance (for ANOVA). The second criteria is whether the selected statistical procedure is the most powerful of the available procedures whose assumptions were met. 

There is an important aspect of quantitative data analysis that is more difficult to judge-the care with which the data were handled before the analysis and the care with which the data analysis was actually conducted.  Manually recorded data almost always includes errors.  Some of the errors can be identified by reviewing the data forms, and for some of those identified errors, the correct value can be inferred.  Data entry into the computer usually results in some errors, and those can be detected by a second independent keying and automatic check, or by visual comparison of the data forms and the computer record. Some additional data errors can be identified by computer edits for values that are out of the eligible range or inconsistent with each other. In addition to data errors, there can be errors in the commands given to the statistical software. The classic warning of professional data processors is “Garbage In, Garbage Out (GIGO).” 

The reader of a research report may detect some errors from implausible results or inconsistencies within or between the tabulated results. Otherwise the best assessment of the data handling is to look in the report for an indication that the data were manually edited, the data entry was verified, and the data file was subjected to further computer edits before the analyses began.

The data analysis of qualitative research is generally inductive, interactive, and iterative. It usually involves the identification of categories, themes, relations among both, and the cross verification of tentative answers to descriptive, associational, and causal questions. The analysis is often described or implied in the discussion of the findings. Competent and careful qualitative data analysis is usually indicated by the researcher exhibiting healthy skepticism, drawing on multiple lines of evidence, and testing his or her early findings with subsequent evidence.

Key Assessment Questions
6. Is the design suitable for the types of questions to be answered?
7. Is the sample likely to be representative of the population or sub-population of interest and was data secured from a large portion of the initial sample?
8. Are the data collection instruments and procedures likely to have measured all the important characteristics with reasonable accuracy?
9. Are the quantitative analysis procedures appropriate, does the qualitative analysis cross-verify important findings, and does all the data analysis appear to have been done with care? 
Note: If the information in the research report does not satisfactorily answer most of these questions, you should worry about the quality of the methodology.  Nevertheless, that is not sufficient evidence to conclude that the research methodology is inadequate.

Return Home or Advance to Lesson A-7

Last Update: June 29, 2000 Link to the George Washington UniversitySend feedbackLink to Education Policy Page