Mathieu Deflem: An Introduction to Sociological Research Design

Mathieu Deflem
Google Scholar | ResearchGate | ORCID

Potentially helpful notes. This version was prepared in 1998.

These notes draw heavily on The Practice of Social Research by Earl Babbie.

Please cite as: Deflem, Mathieu. 1998. "An Introduction to Sociological Research Design." Unpublished paper. Available via deflem.blogspot.com.

OVERVIEW

I. THE STRUCTURE OF SCIENTIFIC INQUIRY

A. Science, Theory, and Research

1. Science and Reality
2. From Theory to Research

B. Research Design, Measurement, and Operationalization

1. Research Design

a) Purposes of Research
b) Units of Analysis
c) Focus and Time of Research
2. Conceptualization and Measurement

a) Conceptualization
b) Measurement Quality
3. Operationalization
4. Indexes, Scales and Typologies

C. Causal Modelling

1. Assumptions of Causal Inquiry
2. Causal Order: Definitions and Logic
3. Minimum-Criteria for Causality

D. Sampling Procedures

1. Probability Sampling

a) Simple Random Sampling
b) Systematic Sampling
c) Stratified Sampling
d) Cluster Sampling
2. Non-Probability Sampling

a) Quota Sampling
b) Purposive Sampling
c) Sampling by Availability
d) Theoretical Reasons for Non-Probability Sampling

II. METHODS OF OBSERVATION

A. Experimental Designs

1. The Structure of Experiments
2. Internal Validity and External Validity
3. Advantages and Disadvantages of Experiments

B. Survey Research

1. The Questionnaire

a) Questionnaire Construction
b) Question Wording
2. The Administration of a Questionnaire

a) Self-Administered Questionnaire
b) Interview Survey
c) Telephone Survey
3. Advantages and Disadvantages of Survey Research

C. Field Research

1. Entering the Field

a) The Role of the Field Researcher
b) Preparing for the Field and Sampling in the Field
2. In-Depth Interviewing

a) In-Depth Interviewing versus Questionnaire
b) Procedure of In-Depth Interviewing
c) Characteristics of In-depth Interviewing
3. Making Observations
4. Advantages and Disadvantages of Field Research

D. Unobtrusive Research

1. Content and Document Analysis
2. Historical Analysis
3. Advantages and Disadvantages of Unobtrusive Research

E. Evaluation Research

1. Measurement in Evaluation Research
2. The Context in Evaluation Research
3. Advantages and Disadvantages of Evaluation Research

I. THE STRUCTURE OF SCIENTIFIC INQUIRY

A. Science, Theory, and Research

Research starts with the researcher, the position where you stand, the world around you, your ethics, etc. The conceptions of the researcher influence the research topic and the methodology with which it is approached. Research is not just a matter of technique or methods.

What is specific to social-science research, as compared to say journalism, is the quest to examine and understand social reality in a systematic way. What is observed is as important as how it is observed.

General outline of a research: theory, conceptualization of theoretical constructs into concepts, formalization of relationships, operationalization, measurement or observation, data analysis or interpretation, report.

1. Science and Reality

Science, as a system of propositions on the world, is a grasp of reality; it is systematic, logical, and empirically founded. Epistemology is the science of knowledge (what is knowledge?), and methodology is the science of gathering knowledge (how to acquire knowledge?). The inferences from science can be causal or probabilistic, and/or it seeks to offer understanding of social processes. Factors that intervene in the process of scientific inquiry include the available tradition of research and the status of the researcher.

Scientific inquiry should reduce errors in observations (mistakes, incorrect inferences), and avoid over-generalizations (e.g. selective observations, only studying that which conforms to a previously found pattern).

Mistakes include: a) ex-post facto reasoning: a theory is made up after the facts are observed, which is not wrong as such, but the derived hypothesis still needs to be tested before it can be accepted as an hypothesis; b) over-involvement of researcher (researcher bias); c) mystification: findings are attributed to supernatural causes; in social-science research, while we cannot understand everything, everything is potentially knowable.

Basically, the two necessary pillars of science are logics and observation (to retrieve patterns in social life, i.e. at the aggregate level). Note that people are not directly researched: social-science research studies variables and the attributes that compose them. A variable is a characteristic that is associated with persons, objects or events, and a variable's attributes are the different modalities in which the variable can occur (e.g. the attributes male and female for the variable sex). Theories explain relationships between variables, in terms of causation or understanding. Typically, this leads to identify independent and dependent variables (cause and effect), or situation, actor, and meaning (interpretation).

2. From Theory to Research

Different purposes of social-science research can be identified: 1) to test a theoretical hypothesis, usually a causal relationship (e.g. division of labor produces suicide); 2) to explore unstructured interests, which usually involves a breaking through of the empirical cycle, shifting from induction to deduction (e.g. what is so peculiar about drug-abuse among young black females); 3) applied research, for policy purposes (e.g. market-research).

The basic model of research is: 1) theory, theoretical proposition, 2) conceptualization of the theoretical constructs, and formalization of a model, the relationships between variables; 3) operationalization of the variables stated in the theory, so they can be measured (indicators) and 4) observation, the actual measurements. The inquiry can be deductive, from theoretical logic to empirical observations (theory-testing), or inductive, from empirical observations to the search for theoretical understanding of the findings of the observations (theory-construction). (Note that, basically, it's always both, cf. Feyerabend, which is more than just an alternation, it's rather an mutual constituency). The wheel of science.
Deduction: the logical derivation of testable hypotheses from a general theory
Induction: the development of general principles on the basis of specific observations

B. Research Design, Measurement, and Operationalization

1. Research Design

Research design concerns the planning of scientific inquiry, the development of a strategy for finding out something. This involves: theory, conceptualization, formalization, operationalization of variables, preparations for observation (choice of methods, selection of units of observation and analysis), observation, data analysis, report (and back to theory).

a) Purposes of Research

The purposes of research are basically three-fold:

1) Exploration: to investigate something new of which little is known, guided by a general interest, or to prepare a further study, or to develop methods. The disadvantage of most exploratory studies is their lack of representativeness and the fact that their findings are very rudimentary.

2) Description: events or actions are observed and reported (what is going on?). Of course, the quality of the observations is crucial, as well as the issue of generalizability.

3) Explanation: this is research into causation (why is something going on?). This is extremely valuable research of course, but note that most research involves some of all three types.

b) Units of Analysis

The units of analysis refer to the what or who which is being studied (people, nation-states). Units of analysis can be (and often are) the units of observation, but not necessarily (e.g. we ask questions to individuals about their attitudes towards abortion, but analyze the religious categories they belong to). Units of analysis in social-science research typically include: individuals within a certain area at a given period of time; groups (e.g. the family); organizations (e.g. social movements); products of human action (e.g. newspapers in a content-analysis); and so on.

Two common problems are: the ecological fallacy, i.e. making assertions about individuals on the basis of findings about groups or aggregations (e.g. higher crime rates in cities with a high percentage of blacks are attributed to blacks, but could actually be committed by the whites in those areas); and reductionism, i.e. illegitimate inferences from a too limited, narrow (individual-level) conception of the variables that are considered to have caused something broader (societal), (e.g. Durkheim does not explain any individual's suicide, but only the suicide-rates among certain categories of people).

c) Focus and Time of Research

The focus in a research can be on: 1) characteristics of states of being (e.g. sex of an individual, number of employees in a company); 2) orientations of attitudes (e.g. prejudice of an individual; the political orientation of a group), and 3) actions, what was done (e.g. voting behavior of individuals; the riot participation of a group).

Research, considered in its time dimension, can be 1) cross-sectional at any given point in time; 2) longitudinal over a period of time to trace change or stability (e.g. panel study of the same people after two elections to see if and how their voting behavior changed); 3) quasi-longitudinal by investigating certain variables in a cross-sectional study (e.g. a comparison of older and younger people indicates a process over time).

2. Conceptualization and Measurement

a) Conceptualization

Theories are comprised of statements that indicate relationships between constructs, i.e. particular conceptions which are labeled by a term. These constructs should be conceptualized, i.e. the meaning of the constructs must be specified, as a working agreement, into clearly defined concepts (which are still mental images). Then we can operationalize those concepts, i.e. specify indicators that measure the concept in terms of its different dimensions (e.g. the action or the ideas that are referred to by the concept of crime). Note that this process reminds us that terms should not be reified into things.

Concepts, then, should be defined in two steps: first, a nominal definition of the concept gives a more precise meaning to the term, but it can not yet be observed as such, therefore, second, the operational definition of the concept spells out how it is to be measured or observed, so that the actual measurement can be undertaken. Example: theoretical construct = social control; nominal definition of concept = social control as the individual's bonding to society; operational definition = attachment to primary institutions, which can be high or low; measure = years of education. Note that these specifications are absolutely necessary in explanatory research.

b) Measurement Quality

Measurements should best be precise, and reliable and valid. Reliability and validity refer to the relationship between measure and concept!!!

1) Reliability: does the replication of a measurement technique lead to the same results?

This refers to the consistency of the measurement techniques. Reliability can be achieved through the test-retest method, i.e. the replication of a method on a phenomenon that could not, or should not, have changed, or of which the amount of expected change is known (e.g. asking for age, and asking again the next year, should lead to a difference of one year). Another technique for reliability check is the split-half method, e.g. if you have ten indicators for a phenomenon, then use five randomly chosen in one questionnaire, and the other five in the other one, apply to two random-samples, then their should be no differences in the distribution of attributes on the measured variable between the two. Other reliability techniques are the use of established methods, and training of researchers.

2) Validity: does the method of measurement measure what one wants to measure?

This means different things: first, face validity is based on common-sense knowledge (e.g. the number of children is an invalid measure of religiosity); second, criterion or predictive validity is based on other criteria that are related to the measurement (e.g. racist actions should be related to responses to racist attitude scales); third, construct validity is based on logical relationships between variables (e.g. marital satisfaction measurements should correlate with measurements of marital fidelity); finally, content validity refers to the degree to which a measure covers all the meanings of a concept (e.g. racism as all kinds of racism, against women, ethnic groups, etc.).

Note that reliability is all in all an easier requirement, while on validity we are never sure. Note also the tension between reliability and validity, often there is a trade-off between the two (e.g. compare in-depth interviewing with questionnaire surveys).

3. Operationalization

Operationalization is the specification of specific measures for concepts in a research (the determination of indicators). Some guidelines: be clear about the range of variation you want included (e.g. income, age), the amount of precision you want, and about the dimensions of a concept you see relevant.

In addition, every variable should have two qualities: 1) exhaustive: all the relevant attributes of a variable must be included (e.g. the magical 'other' category is best not too big), and 2) attributes should be mutually exclusive (e.g. whether a person is unemployed or employed is not exclusive, since some people can be part-time employed and part-time unemployed).

Variables are 1) nominal, when there attributes indicate different, mutually exclusive and fully exhausted qualities (e.g. sex: male or female); 2) ordinal, when the attributes can also be ranked in an order (e.g. type of education); 3) interval, when the distance between attributes in an order is precise and meaningful (e.g. IQ test); and 4) ratio, when, in addition, these attributes have a true zero-point (e.g. age). Note that variables do usually not in and by themselves indicate whether they are nominal, ordinal, etc., or that you can convert them from one type to another (e.g. dummy-variables, from nominal to metric).

Finally, note that you can use one or multiple indicators for a variable; sometimes even, a composite measurement is necessary. (note: see questionnaire design for an application of operationalization).

4. Indexes, Scales and Typologies

There are commonalities between indexes and scales: they both typically involve ordinal variables, and they are both composite measures of variables.

An index is constructed by accumulating scores assigned to individual attributes. The requirements of scales are: face validity (each item should measure the same attribute), unidimensionality (only one dimension should be represented by the composite measure). Then you consider all the bivariate relationships between the items in the scale, the relationship should be high, but not perfect

A scale is constructed by accumulating scores assigned to patterns of attributes. The advantage is that it gives an indication of the ordinal nature of the different items, one item is in a sense included in the other (higher ranked).

A typology is a break-down of a variable into two or more. As dependent variables this is a difficult thing, since any one cell in the typology can be under-represented (it's best then to undertake a new analysis, making sure each cell is well represented).

C. Causal Modelling

1. Assumptions of Causal Inquiry

The first step in causal modelling involves conceptualization: what are the relevant concepts, and, second, how to operationalize these concepts. The next step is formalization, i.e. specification of the relationships between the variables. This seems to destroy the richness of the theory, but it helps to achieve comprehensibility and avoids logical inconsistencies. Note that this model is ideally based on a deductive approach, but it does not exclude a more dynamic approach which moves back and forth (from theory to data).

The causal model itself specifies not only the direction (from X to Y) but also the sign of the relationship (positive or negative). A positive relationship means that when X goes up, Y goes up; a negative relationship between X and Y means that as X goes up why goes down. between different paths, the signs should be multiplied to determine the net-effect. A causal system is consistent when all the causal chains push the relationship in the same direction (indicated by the fact that all the signs are the same). When some signs are positive, others negative, the system is inconsistent (suppressors).

Please note that the causality is not in reality (perhaps it is), but it is above all put into the model by virtue of the theory. This involves a notion of determinism (for the sake of the model), and that we stop some place in looking for any more causes or effects. Also note that the variables in a causal model are all at the same level of abstraction (ideally).

Causal explanations can be idiographic or nomothetic: 1) idiographic explanations seek to explain a particular events in terms of all its caused (deterministic model); 2) nomothetic explanations seek to explain general classes of actions or events in terms of the most important causes (probabilistic model).

2. Causal Order: Definitions and Logic

Prior (unknown or not considered) variables precede the independent variable. Intervening variables are located in between the independent and dependent variable. Consequent variables are all variables coming after the dependent variable (unknown or not considered). Note that the identification of prior, independent, intervening, dependent, and consequent variables is relative to the model at hand.

The causal order between a number of variables is determined by assumptions that determine the causal system that determines the relationship between those variables. (note that variables in a loop have no order, i.e. when the path from X away to other variables returns from those variables back to X).

The following possibilities can be distinguished:

- X causes Y
- X and Y influence eachother
- X and Y correlate

Variable X causes variable Y, when change in X lead to change in Y, or when fixed attributes of X are associated with certain attributes of Y. This implies, of course, that we talk about certain tendencies: X is a (and not the) cause. And this implies correlation as a minimum, necessary condition (the causation itself is theoretical).

3. Minimum-Criteria for Causality

Rule 1: Covariation

Two variables must be empirically correlated with one another, they must co-vary, or one of them cannot have caused the other. This leads to distinguish direct from indirect effects.

Rule 2: Time-order

When Y appears after X, Y cannot have caused X, or in other words, the cause must have preceded the effect in time. Derivative from this is the rule that when X is relatively stable, hard to change, and fertile (it produces many other effects), it is likely to be the independent variable.

Rule 3: Non-Spuriousness

When the observed correlation between two variables is the result of a third variable that influences both of those two separately, then the correlation between the two is spurious. This is indicated by a variable having a causal path to the two variables that correlate.

Basic to causality is the control of variables. Most ideally, this is done by randomization in experiments, then the attributes of any prior variables are randomly distributed over the control and the experimental group. We can also purposely control for prior variables when we select the ones we consider relevant. In bivariate relationships, no variables are controlled, while in partial relationships, one or more of the prior and intervening variables, that might interfere, are controlled. It is better still to identify the necessary and sufficient causes of certain effects but usually we are pleased with either one.

Some common errors are: biased selection of variables to be included in the model, unwarranted interpretation, suppression of evidence, and so on. It is interesting to see the different steps involved in a typical causality-type research and what can go wrong at each step. First, from theory to conceptualization, this step is rarely clear-cut. Second, the step into operationalization is in a way always arbitrary (since the concept indicates more than any measurement). Third, the empirical associations found between measured variables is rarely, if ever, perfect. Finally, any measurement therefore requires additional studies, and any conclusion is in principle falsifiable (variables are shown to be associated, but then the question is how they are associated).

Strategies for causal analysis: - When a bivariate non-zero relationship between X and Y is reduced to zero under control of a third variable, then the third variable explains the bivariate relationship, or the relationship is spurious (causality can never be proven by data analysis); - Check out for the effect of prior variables; - Path analysis.

D. Sampling Procedures

Sampling refers to the systematic selection of a limited number of elements (persons, objects or events) out of a theoretically specified population of elements, from which information will be collected. This selection is systematic so that bias can be avoided. Observations are made on observation units, which can be elements (individuals) or aggregations of elements (families). A population is theoretically constructed and is often not directly accessible for research. Therefore, the study population, the set of elements from which the sample is actually selected, can (insignificantly) differ from the population. In multi-stage samples, the sampling units refer to elements or sets of elements considered for selection at a sampling stage. The sampling frame is the actual list of sampling units from which the samples are selected.

The sampling procedures are designed to best suit the collection of data, i.e. to measure the attributes of the observation units with regard to certain variables. Depending on theoretical concerns and choice of method, probability or non-probability sampling designs are appropriate in research.

1. Probability Sampling

Probability sampling is based on principles of probability theory which state that increasing the sample size will lead the distribution of a statistic (the summary description of a variable in the sample) to more closely approximate the distribution of the parameter (the summary description of that variable in the population). The standard error, inversely related to sample size, indicates how closely a sample statistic approximates the population parameter. These conditions are only met when samples are randomly selected out of a population, i.e. when every element in the population has an equal chance of being selected in the sample.

A randomly selected sample of sufficiently large size (absolute size, not size proportionate to the population) is assumed to be more representative for the population because the relevant statistics will more closely approximate the parameters, or the findings in the sample are more generalizable to the population. Representativeness of samples, or generalizability of sample findings, both matters of degree, are the main advantages of probability sampling designs. The accuracy of a sample statistic is described in terms of a level of confidence with which the statistic falls within a specified interval from the parameter (the broader the interval, the higher the confidence). The main disadvantage of probability sampling is that the theoretical assumptions (of infinity) never "really" apply.

a) Simple Random Sampling

In simple random sampling, each element is randomly selected from the sampling frame. Example: in an alphabetical list of all students enrolled at CU-Boulder, each student is given a number ascending from 1, and 400 students are selected using a table of random numbers.

b) Systematic Sampling

In systematic sampling, every kth element in a list is selected in the sample, the distance k indicating the sampling interval. The systematic sample has a random start when the first element is randomly chosen (out of numbers between 1 and k). Systematic sampling has the advantage of being more practical but about as (sometimes more) efficient than simple random sampling. A disadvantage is the danger of an arrangement of elements forming a pattern that coincides with the sampling interval. Example: in a list of all students enrolled at CU-Boulder, each 100th student, starting with the randomly chosen 205th, is selected. Later it turned out that every other student in the list was female (and the entire sample female), since the composer of the list though "perfect randomness" would lead to perfect probability samples.

c) Stratified Sampling

Stratified sampling is a modification to the use of simple random and systematic sampling. It is based on the principle that samples are more representative when the population out of which they are selected is homogeneous. To ensure samples to be more representative, strata of elements are created that are homogeneous with respect to the (stratification) variables which are considered to correlate with other variables relevant for research (the standard error for the stratification variable equals zero). Example (stratified & systematic): luckily we know how stupid composers of student lists are, so we stratify students by sex (taking every other student in our "perfectly randomized" list); we thus get two strata of students based on sex, and select every 40th student in each stratum.

d) Cluster Sampling

In cluster sampling, clusters of groups of elements are created, and out of each group, elements are selected. This method is advantageous since often complete lists of the population are unavailable. Cluster sampling is multi-stage when first clusters are selected, then clusters within clusters (on the basis of simple random or systematic sampling, stratified or not), and so on, up until elements within clusters. While cluster sampling is more efficient, the disadvantage is that there are sampling errors (of representativeness) involved at each stage of sampling, a problem which is not only repeated at each stage, but also intensified since sample size grows smaller at each stage. However, since elements in clusters are often found to be homogeneous, this problem can be overcome by selecting relatively more clusters and less elements in each cluster (at the expense of administrative efficiency). When information is available on the size of clusters (the number of elements it contains), we can decide to give each cluster a different chance of selection proportionate to its size (then selecting a fixed number within each cluster). This method has the advantage of being more efficient: since elements in clusters are typically more homogeneous, only a limited number of elements for each cluster has to be selected. Finally, disproportionate sampling can be useful to focus on any one sample separately, or for the comparison of several samples. In this case, generalizability of sample findings to the entire population should not and cannot be considered.

Example (multi-stage cluster, proportionate to size, stratified): for research on political attitudes of students in the USA, no list of all students are available, but we have a list of all US states; we select a number of states (clusters); they are given a chance of selection proportionate to the "size" of (number of universities in) each state, because, for instance, there are more universities in the north-eastern states (probability proportionate to size); out of the selected states, we select cities (again proportionate to size, since metropolitan areas have more universities), select universities out of each selected city, take the student lists of each selected university, and select a relatively small number of students (assuming homogeneity among them since we know all students in Harvard are conservative and everybody at CU-Boulder is a liberal).

2. Non-Probability Sampling

The choice between probability or non-probability design is dependent on theoretical premises and choice of method. While probability sampling can avoid biases in the selection of elements and increase generalizability of findings (these are the two big advantages), it is methodologically sometimes not feasible or theoretically inappropriate to undertake them. Then non-probability samplings can be used.

a) Quota Sampling

In quota sampling, a matrix is created consisting of cells of the same attributes of different variables known to be distributed in the population in a particular way. Elements are selected having all attributes in a cell relative to their proportion in the population (e.g. take 90% white and 10% black because based on census data that is the racial composition of the entire population). Although the information on which the proportionate distribution of elements is based can be inaccurate, quota sampling does strive for representativeness (but it is not based on probability theory).

b) Purposive Sampling

Purposive or judgmental sampling can be useful in explorative studies or as a test of research instruments. In explorative studies, elements can purposively be selected to disclose data on an unknown issue, which can later be studied in a probability sample. Questionnaires and other research instruments can be tested (on their applicability) by purposively selecting "extreme" elements (after which a probability sample is selected for the actual research).

c) Sampling by Availability

When samples are being selected simply by the availability of elements, issues of representativeness about the population cannot justifiably be addressed. A researcher may decide to just pick any element that s/he bumps in to. As such, there is nothing wrong with this method, as long as it is remembered that the selection of samples may be influenced by dozens of biases and cannot be assumed to represent anything more than the selected elements.

d) Theoretical Reasons for Non-Probability Sampling

The previous non-probability sampling designs are related to methodological concerns. In fact, the issue of representativeness does matter in the background of these designs but is conceived not feasible or, worse, purported as feasible but not founded on probability theory. However, more interesting and scientifically valuable are the non-probability sampling designs based on theoretical insight. In some theoretical models, it is unwise to conceive the world in terms of probability, sometimes even not as something to be sampled. (this is a kind of purposive sampling, but now because of theoretical concerns).

First, in field research, the researcher may be interested in acquiring a total, holistic understanding of a natural setting. As such, there is no real sampling of anything at all. However, since observations on "everything" or "everybody" can in effect never be achieved, it is best to study only those elements relevant from a particular research perspective (sometimes called "theoretical sampling" or "creative sampling").

Second, when the elements in a natural setting clearly appear in different categories, quota sampling "in the field" can be used. This is the same as regular quota sampling, but the decisions on relevant cells and proportions of elements in cells are based on field observations.

Snowball sampling is used when access to the population is impossible (methodological concern) or theoretically irrelevant. The selection of one element leads to the identification and selection of others and these in turn to others, and so on. (The principle of saturation, indicating the point when no more new data are revealed, determines when the snowball stops). Example (cluster and snowball): in a study of drug-users in the USA, a number of cities (clusters) is randomly selected, a drug-user is selected in each city (e.g. through clinics), is interviewed and asked for friends that use drugs too, and so on. Example (snowball): a researcher is interested in African-American HIV infected males in Hyde Park, Chicago; the research aims at in-depth understanding of this setting, and inferences about other HIV infected males are trivial (apart from being impossible).

Third, the sampling of deviant cases can be interesting to learn more about a general pattern by selecting those elements that do not conform to the pattern. Example: 99% of the students at CU voted for Clinton, so I select those that did not, to find out why they are "deviant".

These samples are purposive samples with a theoretically founded purpose. As long as that is the case, their use may be perfectly justified and, according to some theories, even the only applicable ones. The main disadvantage of non-probability sampling designs is the lack of representativeness for a wider population. But again, based on some theories, these difficulties can precisely be advantages (as long as the methodological and theoretical positions are clearly stated, both probability and non-probability sampling designs can be equally "scientific").

II. METHODS OF OBSERVATION

A complete research design is not just a matter of determining the right methods of observation, there is always (or there better be) theory first. The following procedure can be suggested.

First, there should be a theory that states what is to be researched, and how this connects to the already available body of literature (to ensure, or strive towards, cumulative knowledge). There is no "naked" or mind-less observation.

Second, the theory has to be conceptualized, so that the different variables of the theory are clearly defined and identified. This may also involve acknowledgment of the limitations of the approach.

Third, the research topic and methodology is formalized into observable phenomena. This involvers specification of the research topic (where, when) and the methods of observation (how) as well as the way in which the data are to be analyzed, and what the anticipated findings are.

Finally, after the research is conducted, a report is drawn up, indicating theory, methodology, as well as findings.

A. Experimental Designs

The most important issue in an experiment is randomization (as a matter of internal validity). There are issues of internal and external validity, and the problems and solutions of external validity. Note the strength and limitations with regard to the control of variables, i.e. all the variables we know might interfere.

1. The Structure of Experiments

A classical experiment involves four basic components.

1) An experiment examines the effect of an independent variable on a dependent variable. Typically, a stimulus is either absent or present. In this way, a hypothesis on the causal influence between two variables can be tested (see logic of causal modelling). Both variables are, of course, operationalized.

2) An experiment involves pretesting and posttesting, i.e. the attributes of a dependent variable are measured, first before manipulation of the independent variable, and second after the manipulation. Of course, applied to one group, this may affect the validity of the results, since the group is aware of what is being measured (research affects what is being researched).

3) Therefore, it is better to work with experimental groups and control groups. We select two groups for study, then apply the pretesting-posttesting, and thus conclude that any effect of the tests themselves must occur in both groups. There can indeed be a Hawthorne effect, i.e. the attention given to the group by the researchers affects the group's behavior. Note that there can also be an experimenter bias, which calls for accurate observation techniques of the expected change in the dependent variable.

4) Selecting Subjects

Note that there can always be some bias because often students are selected (problem of generalizability). Also, note that samples of 100 or not very representative, and that experiments often have fewer than 100 subjects.

Randomization refers to the fact that the subjects (which are often non-randomly selected from a population) should be randomly assigned to either the experimental or the control group. This does not ensure that the subjects are representative of the wider population from which they were drawn (which they usually are not), but it does ensure that the experimental and the control group are alike, i.e. the variables that might interfere with the results of the experiment will, based on the logic of probability, be equally distributed over the two groups. Note that randomization is related to random-sampling only in the sense that it is based on principles of probability (the two groups together are a "population", and the split into two separate groups is a random-sampling into two samples that mirror eachother and together constitute this "population").

Matching refers to the fact that subjects are purposely assigned by the researcher to either the control or the experimental group on the basis of knowledge of the variables that might interfere with the experiments. This is based on the same logic as quota sampling. Matching has the disadvantage that the relevant variables for matching decisions are often not all known, and that data analysis techniques assume randomness (therefore, randomization is better).

Finally, the experiments should be conducted in such a way that the only difference between the experimental and the control group is the manipulation of a variable during the experiment.

Taken together, randomization or matching, and the fact that the manipulation during experimentation is the only difference between the two groups, these techniques allow for the control of all variables, other than the manipulated one, to interfere in the outcome of the experiment (internal validity!).

Note on the One-Shot Case Study:

A single group is manipulated on an independent variable, and then measured on a dependent variable. This method must involved pretest and posttest to be of any significance (otherwise there is nothing to compare), i.e. the one-group pretest-posttest design, but then we are not sure if it was the manipulated variable that caused the observed difference.

2. Internal Validity and External Validity

a) Internal Validity: did the experimental treatment cause the observed difference?

The problem of internal validity refers to the logic of design, the fact whether other variables that may intervene were controlled, i.e. the integrity of the study. The problem can be that the conclusions of an experiment are not warranted based on what happened during the experiment. This can come about because of: a) accident: historical events can have occurred during the experiment and affected its outcome; b) time: people change, mature, during the period of experimentation; c) testing: the groups are aware of what is being researched; d) instrumentation: the techniques to measure pretest and posttest results are not identical (reliability); e) statistical regression: results are biased because the subjects started with extreme values on a variable; and f) other problems include, that the relationships are temporal but not causal, and that the control group may be frustrated or stuff.

Randomization of subjects into an experimental and a control group (to ensure that only the experimental manipulation intervened, while other variables are controlled), and reliable measurements in pretest and posttest are guards against problems of internal validity.

b) External Validity: are the results of the experiment generalizable?

The problem of external validity refers to the issue of generalizability: what does the experiment, even when it is internally valid, tell us about the real, i.e. non-manipulated, world?

A good solution is a four-group experimental design, i.e. first an experimental and a control group with pretest and posttest, and second, an experimental and a control group with posttest only. And better than anything else is a two-group design with posttest only when there is good randomization, since randomization ensures that all variables are evenly distributed between experimental and control group so that we do not have to do a pretest.

An experimental manipulation as close as possible to the natural conditions, without destroying internal validity, are the best methods to ensure external validity.

c) Note on Ex-Post Facto Experiment

This is not a true experiment since there is (was) no control group. The manipulation of the independent variable has naturally occurred (e.g. earthquake). We are of course not sure, say when we compare with a group were the natural "manipulation" did not take place, that there are (or are not) other variables involved (very bad on the control of variables).

3. Advantages and Disadvantages of Experiments

The isolation of the one crucial variable, when all others are controlled, is the main advantage of experiments (it can lead to hypothesis falsification). Experiments are well-suited for projects with clearly defined concepts and hypotheses, thus it is the ideal model for causality testing. It can also be used in the study of small-group interaction, possibly in a field research, i.e. as a natural experiment. Experiments can also be repeated.

The big disadvantage is the artificial character of the research, and, in the social sciences, they often involve ethical difficulties, or can simply not be executed.

B. Survey Research

Note on quantification, which is quite essential in survey research, that numbers are representations of..., they are created, they represent something, so do not reify them (e.g. they are limited to the sample, and therefore to the sampling procedure - typically a probability sample design). You have to know the process that created the numbers or you cannot make any inferences. The powers of the analytical tools (quantitative data analysis) should not be abused. Note that quantitative methods are generally better on matters of reliability, while qualitative methods are better on validity.

The main advantage of survey research is of course the generalizability of its findings because of the representativeness of the sample (see sampling - as a matter of external validity). Note that a pre-test of the questionnaire is always, I said always, necessary (as a matter of validity).

1. The Questionnaire

Survey research typically involves administering a questionnaire to a sample of respondents to draw conclusions on the population from which the sample is drawn. The questionnaire is standardized to ensure that the same observation method is used on all respondents. This involves considerations of questionnaire construction, question wording, and the way in which the questionnaire is administered to the respondents.

a) Questionnaire Construction

In the construction of the questionnaire, attention is devoted to increase the respondents' cooperation and avoid misunderstanding of the questions. First, the questionnaire format should be presentable, not too densely packed, and clear. This involves using intelligible contingency ("if no/yes go to...") questions, or matrix questions that contain al the items or response options to a question. Second, the effects of question order have to be considered, and this can be pre-tested with different questionnaires, and by being sensitive to the research problem. Third, clear instructions on how to answer the questions should be given, and it is best to divide the questionnaire into different sections that are each preceded with instructions.

b) Question Wording

The question wording should equally enhance the unambiguous nature of the questionnaire. Several options are available depending on the research perspective: attitudes, for instance, can be measured with Likert scale questions (variation from strongly disagree to strongly agree). Questions can also be open-ended (and coded by the researcher for analysis) or closed-ended (an exhaustive list of mutually exclusive alternatives). Note that open-ended questions may pose problems for analysis (too many responses), while closed-ended questions may impose too rigid a framework on the respondents. Also, each statement should not be too long, not negatively phrased, and posed in neutral, unambiguous terms to avoid social desirability effects and bias in any one (pro/con) direction. Also avoid double-barreled questions, and make sure to ask comprehensible and relevant questions.

2. The Administration of a Questionnaire

Questionnaires can be administered in a variety of ways.

a) Self-Administered Questionnaire

In this type of survey, respondents fill out a questionnaire delivered to them by mail, taking precautions to ensure a sufficiently high response rate, or they can be delivered "on the spot", e.g. in a factory or school. The basic problem is the monitoring of returns, which have to be identified, i.e. you have to make up a return graph to indicate the response rate (over 50%), and you have to send follow-up mailings to non-respondents.

b) Interview Survey

In a (more time-consuming and expensive) interview survey, sensitive and complicated issues can be explored face-to-face. This method also ensures a higher response rate, and a reduction of "don't know" answers. The interviewer has more control over the data collection process (note that observations can be made during the interview) and can clarify, in a standardized way, unclear questions. Since the questionnaire is the main measurement instrument, the interviewer must make sure that the questions have identical meaning to all respondents: interviewers should (and are trained to) be familiar with the questionnaire, dress like the respondents, behave in a neutral way during the interview, follow the given question wording and order, record the answers exactly, and probe for answers. Interview surveys typically have a higher response rate (affecting generalizability).

c) Telephone Survey

A questionnaire conducted by telephone is a cheaper and less time-consuming method, one moreover in which the researcher can keep an eye on the interviewers, but one on which the respondents can also hang up.

3. Advantages and Disadvantages of Survey Research

Survey research generally has the advantage that, depending on the research objective, it can serve descriptive, explanatory, as well as exploratory purposes. But more important than anything else, depending on sampling techniques, it can generalize findings to large populations, while the standardization of the questionnaire (and the way it is administered) ensures reliability of the measurement instrument. In addition, many respondents can be researched, relatively many topics can be asked about them (flexibility), and statistical techniques allow for accurate analysis. Note that pre-collected data can also be analyzed for a different purpose (secondary data-analysis).

The main weakness of survey research is its rather superficial approach to social life: because all subjects are treated in a unified way, the particularities of each cannot be explored in any great detail, and no knowledge is acquired of the social context of the respondents' answers. Also, surveys measure only answers, and not what this actually refers to (you know whether a person has responded to be "conservative" but not whether s/he is). Next, surveys are not so good in measuring action, but rather thoughts about action. This raises questions of validity: perhaps the questionnaire does not reveal anything "real", that is, anything of genuine concern for the respondents themselves.

C. Field Research

While surveys typically produce quantitative data, field research yields qualitative data. Also notice how field-research often not only produces data but also theory (alternation of deduction and induction).

1. Entering the Field

Depending on sampling procedure, a research site is selected and observations will be made and questions asked within the natural setting.

a) The Role of the Field Researcher

1) complete participant: the researcher is covertly present in the field and fully participates as if he is a member of the community under investigation; the problems are ethical, your mere presence might affect what goes on, and there are practical problems (e.g. when and how to leave the field?); 2) participant-as-observer: the researcher participates yet his identity is known; 3) observer-as-participant: the researcher observes and his identity is known; the latter two, since identity is known, may affect what's going on in the field, and it could cause the researcher to be expelled from the field; 4) complete observer: the researcher merely observes and his identity is not known.

b) Preparing for the Field and Sampling in the Field

Start with a literature review (as always), then research yourself, why are you interested?, what will you bring to the field?, etc. Then search for informants, gate-keepers, and make a good impression (or simply join the group you want to study). Establishing rapport is very important, and if your identity is known, it is important to tell them what you are there for (although you may choose to lie). Then sample in the field (see above). Remember that the overall goal of field research is to acquire the richest possible data.

2. In-Depth Interviewing

a) In-Depth Interviewing versus Questionnaire

While standardized questionnaires are typically, though not necessarily, employed in quantitative research, in-depth or unstructured interviewing is closely associated with qualitative field research. Like any interview, an in-depth interview can be defined as a "conversation with a purpose": an interview involves a talk between at least two people, in which the interviewer always has some control since s/he wants to elicit information. In survey interviews, the purpose of the conversation is dominant, especially when it involves the testing of hypotheses (a relationship between two or more variables). In-depth interviewing, in comparison, takes the "human element" more into account, particularly to explore a research problem which is not well defined in advance of the observation process. In-depth interviewing does not use a questionnaire, but the interviewer has a list of topics (an interview-guide) which are freely explored during the interview, allowing the respondent to bring up new issues that may prove relevant to the interviewer. The in-depth interviewer is the central instrument of investigation rather than the interview guide.

b) Procedure of In-Depth Interviewing

The procedure of in-depth interviewing first involves establishing a relationship with the respondent: even more than is the case with questionnaires, it is crucial that the interviewer gains the trust of the respondent, otherwise the interview will hardly reveal in-depth insight into the respondent's knowledge of, and attitudes towards, events and circumstances. Since the kind of information elicited in the interview is not pre-determined in a questionnaire, tape-recording (and negotiation to get permission) is appropriate. The role of the in-depth interviewer involves a delicate balance between being active and passive: active because s/he guides the respondent tactfully to reveal more information on an issue considered relevant, passive because the interviewer leaves the respondent free to bring up issues that were unforeseen but nevertheless turn out to be relevant. Since the interviewer should talk, listen, and think during the interview, his/her experience and skill greatly contributes to the quality of the research findings. Note that in a field research, the interview can be formal or informal: in formal in-depth interviewing the researcher's identity is known and the respondent knows that an interview is going on, while an informal in-depth interview appears to be (to the respondent) just a conversation with someone (who is actually a covert researcher).

c) Characteristics of In-depth Interviewing

In-depth interviewing has the advantage of being able to acquire a hermeneutic understanding of the knowledge and attitudes specific to the respondent (without an "alien", super-imposed questionnaire). It is often called a more valid research method. However, this assertion needs qualification: both in-depth and survey interviews approach human subjects with a perspective in mind, but only in in-depth interviewing is this perspective amenable to change (given the quest for what is unique to the person being interviewed), while in surveys it is not allowed to change (given the quest for generalizability of the findings). During a research process involving several in-depth interviews, the "big wheel of science" can freely rotate between induction and deduction (finding new things and asking about them, cf. grounded theory). In addition, the method is beneficial for explorative research on a (sociologically) new issue. The main weakness of in-depth interviewing is its lack of reliability: without a fixed questionnaire, the interviewer's flexibility, while allowing for new information, may affect the research findings, not because of respondents' characteristics, but because of the different ways in which they were interviewed. Since in-depth interviewing often does not rely on random sampling of respondents, issues of generalizability cannot (but often do not have to) be addressed. Finally, the results of in-depth interviews are harder to analyze than survey questionnaire findings, since they cannot easily be transferred into numbers (allowing for statistical analysis) but have to be brought together comprehensively in meaningful categories that do not destroy the uniqueness of the findings (the recent use of computerized techniques of qualitative data-analysis is helpful in this regard).

3. Making Observations

In your observations, be sure to see as much as you can and to remain open-minded on what you see; you want to understand, not to condemn or approve. Once you have taken up your role, do not get over-involved, nor completely disengaged.

Very important is to record what you observe accurately, and best as soon as possible after the event occurred. Therefore, you should keep a field journal (or tape). Field notes include what is observed and interpretations of what is observed. Also, keep notes in stages, first rather sketchy and then more in detail. Finally, keep as many notes as you can (anything can turn out to be important). Apart from that, a separate file can be kept on theoretical and methodological concerns, as well as reports of the researcher's own personal experiences and feelings.

As an initial step for analysis, the notes must be kept in files (with multiple entries), to discover patterns of behavior or practices, instances of attitudes and meanings of events for the observed, encounters of people in interaction, episodes of behavior (in which a sudden event can be crucial), and roles, lifestyles and hierarchies. These analytically conceived files should keep the chaos of observation together. Be flexible about your files.

The analysis itself can then proceed to discover similarities and differences: what re-appears in the field, which events seem to indicate the same pattern of behavior or thought, as well as what is "deviant" in the research site, and so on. Note, of course, that it is typical for field research that observing, formulating theory, evaluating theory, and analyzing data, can all occur throughout the research process.

Important tools to avoid problems of mis-interpretation or biased observations include: add quantitative findings to your field observations (triangulation), keep in touch with a supervisor, and ensure your self-awareness (introspection).

In writing up the report, an account of the method of observation and/or participation, as well as reflections of the researcher's experiences and motives are inevitable.

4. Advantages and Disadvantages of Field Research

Field research is especially appropriate if you want to research a social phenomenon as completely as possible (comprehensiveness), within its natural setting, and over some period of time. Also, the method is flexible and can move freely from induction to deduction, it is relatively inexpensive.

With regard to validity, field research is generally stronger than survey research. But as a matter of reliability, the method may be too much tied up to the person that did the research (which is why their methods and experiences have to be reported and evaluated). Finally, field research lacks generalizability, because of the uniqueness of the researcher's investigative qualities, because the comprehensiveness of research essentially excludes generalizability, and because of selectivity in observations and question asking. Therefore, the findings of field research are suggestive (not definitive).

See also: interview with William Foote Whyte on YouTube.

D. Unobtrusive Research

Survey research and in-depth interviewing affect their object of study in at least (and hopefully only) one way: people are confronted with social-science research! Unobtrusive methods of inquiry, on the other hand, have no impact on what is being studied. There are three methods of unobtrusive research: content analysis, analysis of statistics, and historical analysis.

1. Content and Document Analysis

Content analysis refers to the quantitative study of written and oral documents. This requires sampling of the units of analysis in a source (best probability sampling), codification of the units, and finally classification of the units to reveal their manifest and latent content.

Document analysis refers to the qualitative study of traces of the past: it involves the in-depth investigation of sources and aims at hermeneutic understanding.

2. Historical Analysis

Historical research refers to the study of the past through an examination of the traces the past has left behind (written documents, oral histories, and artefacts). The procedure of historical research typically involves: 1) selection of sources relevant for research; 2) identification and registration of sources according to formal and substantial criteria; 3) confrontation and (internal/external) critique of sources; 4) interpretation and analysis of sources to determine who said what to whom, why, how, and with what effect.

Three methods of data collection can be used in historical research (note that these methods do not have to be, but can be historical): content analysis, document analysis, and historical study of statistics. The historical investigation of statistics can trace a pattern over time (e.g. crime reports). Of course, you are again stuck to what you found (validity!).

See Comparative and Historical Sociology: Lecture Notes for much more.

3. Advantages and Disadvantages of Unobtrusive Research

The unobtrusive nature of research is the main advantage of the method: the researcher cannot affect what has happened. Several topics can be studied from this perspective, particularly forms of communication (who says what to whom, why and with what effect). Note that the techniques can be very rigidly applied (good on reliability). Also, it has the advantage that it saves time and money, and you can study long periods of time. Moreover, unobtrusive historical research can fulfill several purposes: 1) the parallel testing of theories, to apply a theory to several historical cases; 2) the interpretation of contrasting contexts, to reveal the particularities of historical events; and 3) analyzing causalities, to explain why historical events took place.

The main weakness of historical research is the historical fact that it is probably the least developed method of social-science research. Although many reputed sociologists used historical research methods (e.g. Durkheim on the division of labor, Marx and Weber on capitalism, Merton on science and technology), the idea that a study of the past can be meaningful in and by itself, or to grasp the present, only rarely inspires research. In addition, historical research can only reveal the past inasmuch as it is still present today: important documents, for instance, may be lost or destroyed (bad on validity). Finally, because of the often less rigid nature of this method of inquiry, the researcher can (invalidly) affect his/her picture of what has happened. Therefore, corroboration, the cross-checking of various sources, is helpful.

E. Evaluation Research

Evaluation research is intended to evaluate the impact of social interventions, as an instance of applied research, it intends to have a real-world effect.

Just about any topic related to occurred or planned social intervention can be researched. Basically, it intends to research whether the intended result of an intervention strategy was produced.

1. Measurement in Evaluation Research

The basic question is coming to grips with the intended result: how can it be measured, so the goal of an intervention program has to be operationalized for it to be assessed in terms of succes (or failure).

The outcome of a program has to be measured, best by specifying the different aspects of the desired outcome. The context within which an outcome occurred has to be analyzed. The intervention, as an experimental manipulation, has to be measured too. Other variables that can be researched include the population of subjects that are involved in the program. Measurement is crucial and therefore new techniques can be produced (validity), or older ones adopted (reliability).

The outcome can be measured in terms of whether an intended effect occurred or not, or whether the benefits of an intervention outweighed the costs thereof (cost/benefit analysis). The criteria of success and failure ultimately rest on an agreement.

The evaluation can occur by experiment, or by quasi-experiment. Time-series analysis, for instance, can analyze what happened for a longer period before and after an intervention, and with the use of multiple time-series designs, we can also compare with a pseudo control group.

2. The Context in Evaluation Research

There are a number of problems to be overcome in evaluation research. First, Logistical problems refer to getting the subjects to do what they are supposed to do. This includes getting them motivated, and ensuring a proper administration. Second, ethical problems include concerns over the control group (which is not manipulated, and whose members may experience deprivation).

It is hard to overlook what is done with the findings of an evaluation research, for instance, because the findings are not comprehensible to the subjects, because they contradict 'intuitive' beliefs, or because they run against vested interests.

Note social indicators research as a special type of evaluation research. This is the analysis of social indicators over time (pattern of evolution) and/or across societies (comparison). These indicators are aggregated statistics that reflect the condition of a society or a grouping.

3. Advantages and Disadvantages of Evaluation Research

The main advantage is that evaluation research can reveal whether policies work, or at least identify when they do not work (pragmatism), right away (when we use experiments) or over a long period of time and across societies (indicators). (different research instruments can be used in evaluation research)

The disadvantages include the special logistic and administrative problems, as well as the ethical considerations. Also, it can usually only measure the means, given certain program goals, but cannot go into questioning those goals themselves.

Check here for notes on Classical Sociological Theory.

Pages

An Introduction to Sociological Research Design