Reading in and transforming variables for analysis in spss

Cours reading in and transforming variables for analysis in spss, tutoriel & guide de travaux pratiques en pdf.

Reading in and Transforming Variables for Analysis in SPSS

Before any data can be analyzed by SPSS with the technique termed analysis of variance (ANOVA), the data to be analyzed must be introduced to, or entered into, SPSS. Subsequent chapters explain how to perform the several variants of ANOVA presuming that the data are already entered. This chapter’s focus is to provide information on how to get the variables into SPSS beforehand.
Methods for reading in or directly entering the data are described, as well as those for performing simple data transformations (e.g., computing an average).

READING IN DATA WITH SYNTAX

Before examining the syntax in Fig. 2.1, the reader is strongly advised to reread the syntax conventions discussed in the previous chapter. For example, the line numbers are not to be typed in.
The first statement in Fig. 2.1, “TITLE”, is an optional command (i.e., it is perfectly acceptable to leave it off; note the o beside the line number) that allows the user to specify a title in the printout. You decide what the title should be. In this example, ‘a one-factor anova design’ was used. The title does not in any way affect the analysis. It will simply appear at the top of each page of the printout.
The space between the command “TITLE” and the actual title is required. If you wish to use a long descriptive title, you may continue the title on the next line. To do this, simply indent the second line one space. However, SPSS will only repeat the first 60 letters of the title at the top of every page. Additional
descriptive information can be added to the top of every page with the “SUBTITLE” command.
The “SUBTITLE” command is placed on the line following the “TITLE” command, with the actual subtitle separated from the “SUBTITLE” command by a space as follows: SUBTITLE example from chapter 2 of Page et al.
By default on many platforms, SPSS prints the results or output of your requested analyses on lines that are 132 characters wide. The optional command on line 2 reduces the size of the output to 80 columns, which will make it easier for you to see the entire output on your computer screen and,
moreover, the output will fit on an 8.5- × 11-in. piece of paper. In SPSS for Windows, this control over output size is given on Edit–Options–Viewer (or Draft Viewer); then click the desired alternative on Text Output.
Entering Data with the “DATA LIST” Command
One of the most crucial steps in programming is telling the statistical package how to “read” your data file. There are several ways to do this, including, in Windows, typing values directly into the Data Editor Window, which is described later. The most general method, available to both nonWindows and Windows users, is through the use of syntax, specifically the “DATA LIST” command on line 3 in Fig. 2.1, which tells SPSS (a) where the data are and (b) what value to give each variable (or measure, or score) for each participant. If you have a very small data set, you may want to type the data within the SPSS program, as was done in the example of Fig. 2.1. If your data set is large, however, you may prefer to type the data in another file called an external file (a separate file of just data), which you will read into the program with the “DATA LIST” command, to be described later in this chapter.
The first example, however, assumes that the data are within the SPSS program, as in the example in Fig. 2.1. As seen in line 3, the command “DATA LIST” is followed by a keyword describing the ype of format of the data, “FIXED” or “FREE” (more on this later). This line is followed by a subcommand (here in line 4; recall, however, that subcommands need not be on different lines) that
provides the names you wish to give the variables and, for “FIXED” format, their column locations.
Thus, this subcommand will tell SPSS which variables are in which columns for “FIXED” or in which order for “FREE”. In this example of “FIXED”, as will be explained in more detail later, line 4 specifies that the variable to be called ‘facta’ is in column 1 and the variable to be called ‘dv’ is in columns 3 and 4. Because, in this example, the data are included in the program, the “BEGIN..

READING IN AND TRANSFORMING VARIABLES

FIG. 2.1. Syntax commands to read in data.
DATA” command on line 5 is used, followed by all of the data in lines 6 through 20 (in the columns or order specified on the “DATA LIST” command), followed by the “END DATA” command in line 21.

“FREE” or “FIXED” Data Format

The data format can be “FIXED” or “FREE”. “FREE” format data indicates that each participant’s score on each variable will be separated by one or more blank spaces. Furthermore, scores on a given variable may be located in different columns for different participants. However, the measures must be entered in the same order for all participants. Following is an example of a “DATA LIST” command for “FREE” format. To be particularly clear here, the blank spaces in the data are indicated with a “^”:
DATA LIST FREE /id age m1 m2 m3.
SPSS will understand that, for the first participant (i.e., the first line of data just presented), the variable you wish to call ‘id’ is to have the value 142, the variable you want called ‘age’ is to have the value 48, that he or she is to get a 4 on the variable you want called ‘m1’ (perhaps shorthand for
“Measure 1”), a 16 on ‘m2’, and a 7 on ‘m3’. For the second participant (second line of data), the participant’s ‘id’ is 78, his or her ‘age’ is 24, and he or she got a 1 on ‘m1’, a 2 on ‘m2’, and a 33 on ‘m3’. (Note that there is inconsistently more than one space between scores; this is completely permissible with the “FREE” data format.)
Names you wish to give to variables can have no more than eight characters and they cannot begin
with a number. Additionally, there are some sets of letters that can form keywords for some commands and, therefore, must be avoided as names. The sets of letters that you cannot use as variable names are the following: ALL, AND, BY, EQ, GE, GT, LT, LE, LT, NE, NOT, OR, TO, and WITH. Ideally, the variable names should also be mnemonic, easily recognized by you later. For example, if the first variable in the data file represents a participant’s identification number, you might call that variable ‘subjid’ or ‘id’. The program must be consistent in the use of the names in the “DATA LIST” and other later (e.g., “MANOVA”) commands referring to the same variables.
“FIXED” is the other common data format. It is the default and thus the keyword “FIXED” does not actually have to be typed in if your data are in “FIXED” format. “FIXED” format means that the data are organized so that each variable is stored in a particular column (or columns). In this format, the subcommand contains an ordered list of the variable names you wish to use, each followed by the specific column or a successive series of columns where that variable is found. The columns containing a specific measure must be the same for all participants. Here is an example:
DATA LIST FIXED /id 1-3 age 4-5 m1 6 m2 7-8 m3 9-10.
Note that each variable is followed by a single digit or series of digits. The ‘6’ following ‘m1’, for example, tells SPSS that ‘m1’ can always be found in column 6 for every participant. In contrast, ‘id’, ‘age’, ‘m2’, and ‘m3’ are more than single digit variables; the first number following each refers to the column containing the first digit of the variable and the final number refers to the column containing the last digit of the variable. These are separated by a dash (‘-’) in the subcommand. Thus, ‘id’ is in columns 1 through 3 and ‘age’ is in columns 4 through 5. Thus, for the following data:
14248416^7
^78241^233
the first participant’s ID number is 142 (first 3 columns), his or her ‘age’ is 48, and he or she got a 4 on ‘m1’, a 16 on ‘m2’, and a 7 on ‘m3’. For the second participant (i.e., second line of data), the ‘id’ is 78, the ‘age’ is 24, and ‘m1’, ‘m2’, and ‘m3’ are 1, 2, and 33, respectively. Note that, when a variable is declared by the “DATA LIST” to have more than one column, but a certain participant has a value that requires less columns than specified, the columns to the left are blank. For example, whereas ‘m2’ has columns 7 through 8 devoted to it, the second participant’s value is only one column long, the value 2.
The initial column (i.e., 10’s place) is therefore left blank (this process is called right justifying).
If a variable beginning in, say, the sixth column was called ‘m1’, the seventh column ‘m2’, and the eighth ‘m3’, you could refer to the column numbers just once, as with:
/m1 m2 m3 6-8. or /m1 TO m3 6-8.
If the variables took up more than one space, but all took up the same number of spaces, the same
economy of space indication would be possible. For example:
/k1 k2 k3 10-15. or /k1 TO k3 10-15.
would mean that the variable ‘k1’ is in spaces 10 and 11, ‘k2’ is in 12 and 13, and ‘k3’ is in 14 and 15.
The following three subcommand lines tell SPSS the same thing and are interchangeable:
/id 1-3 age 4-5 m1 6 m2 7 m3 8 iq 24-26.
/id 1-3 age 4-5 m1 m2 m3 6-8 iq 24-26.
/id 1-3 age 4-5 m1 TO m3 6-8 iq 24-26.
The “TO” shortcut is an excellent shortcut to enter a series of variables whose names differ only by the sequential number at the end. (In subsequent commands, the “TO” keyword can be used in a different way, as a shortcut to identify variables that were sequentially named on the “DATA LIST” or later created with transformations. For example, suppose the “DATA LIST” creates data in this order: q2, x, v3, iq, v4. Then ‘q2 TO v4’ can be used in later commands to refer to this set of successive variables.)
Look back at Fig. 2.1, beginning with line 6, and observe the succeeding rows. The first number in each row ranges between 1 and 3; that is because there are three values to the variable called ‘facta’. The first five participants (each having a separate line) are in the first value or “level” of ‘facta’, the next five are in the second level or group, and so on. The second number for each participant refers to that participant’s score on the dependent variable, called ‘dv’.
Some Special Cases You can leave blank spaces between the numbers in “FIXED” format; just be sure to skip the same columns each time and be sure to identify the correct starting columns for your variables. Occasionally, you might have a string (i.e., text, word, or alphabetic) variable, in which the value is not a number, but a letter or string of letters. For example, imagine that you have recorded gender in the data in column 7 as M or F, rather than, say, 1 or 2. In this case, you would follow its name in the subcommand with an “(a)”, that is, ‘gender (a) 7’.
Sometimes you may have a variable that inherently contains a decimal place but you have not actually typed the decimal place in the data. In this case, you may identify the number of decimal places you wish the variable to have in parentheses in the subcommand (e.g., ‘gpa 8-10(2)’).

1 USING SPSS AND USING THIS BOOK
Conventions for Syntax Programs
Creating Syntax Programs in Windows
2 READING IN AND TRANSFORMING VARIABLES FOR ANALYSIS IN SPSS
Reading In Data With Syntax
Entering Data with the “DATA LIST” Command
“FREE” or “FIXED” Data Format
Syntax for Using External Data
Data Entry for SPSS for Windows Users
Importing Data
Saving and Printing Files
Opening Previously Created and Saved Files
Output Examination
Data Transformations and Case Selection
“COMPUTE”
“IF” 15
“RECODE”
“SELECT IF”
Data Transformations with PAC
3 ONE-FACTOR BETWEEN-SUBJECTS ANALYSIS OF VARIANCE
Basic Analysis of Variance Commands
Testing the Homogeneity of Variance Assumption
Comparisons
Planned Contrasts
Post Hoc Tests
Trend Analysis
Monotonic Hypotheses
PAC
4 TWO-FACTOR BETWEEN-SUBJECTS ANALYSIS OF VARIANCE
Basic Analysis of Variance Commands
The Interaction
Unequal N Factorial Designs
Planned Contrasts and Post Hoc Analyses of Main Effects
Exploring a Significant Interaction
Simple Effects
Simple Comparisons and Simple Post Hocs
Interaction Contrasts
Trend Interaction Contrasts and Simple Trend Analysis
PAC
5 THREE (AND GREATER) FACTOR BETWEEN-SUBJECTS ANALYSIS OF VARIANCE
Basic Analysis of Variance Commands
Exploring a Significant Three-Way Interaction
Simple Two-Way Interactions
A Nonsignificant Three-Way: Simple Effects
Interaction Contrasts, Simple Comparisons, Simple Simple Comparisons, and Simple Interaction Contrasts
Collapsing (Ignoring) a Factor
More Than Three Factors
PAC
6 ONE-FACTOR WITHIN-SUBJECTS ANALYSIS OF VARIANCE
Basic Analysis of Variance Commands
Analysis of Variance Summary Tables
Correction for Bias in Tests of Within-Subjects Factors
Planned Contrasts
The “TRANSFORM/RENAME” Method for Nonorthogonal Contrasts
The “CONTRAST/WSDESIGN” Method for Orthogonal Contrasts
Post Hoc Tests
PAC
7 TWO- (OR MORE) FACTOR WITHIN-SUBJECTS ANALYSIS OF VARIANCE
Basic Analysis of Variance Commands
Analysis of Variance Summary Tables
Main Effect Contrasts
Analyzing Orthogonal Main Effects Contrasts (Including Trend Analysis)
Using “CONTRAST/WSDESIGN”
Nonorthogonal Main Effects Contrasts Using “TRANSFORM/RENAME”
Simple Effects
Analyzing Orthogonal Simple Comparisons Using “CONTRAST/WSDESIGN”
Analyzing Orthogonal Interaction Contrasts Using “CONTRAST/WSDESIGN”
Nonorthogonal Simple Comparisons Using “TRANSFORM/RENAME”
Nonorthogonal Interaction Contrasts Using “TRANSFORM/RENAME”
Post Hocs
More Than Two Factors
8 TWO-FACTOR MIXED DESIGNS IN ANALYSIS OF VARIANCE: ONE BETWEEN-SUBJECTS FACTOR AND ONE WITHIN-SUBJECTS FACTOR
Basis Analysis of Variance Commands
Main Effect Contrasts
Between-Subjects Factor(s)
Within-Subjects Factor(s)
Interaction Contrasts
Simple Effects
Simple Comparisons
Post Hocs and Trend Analysis
9 THREE- (OR GREATER) FACTOR MIXED DESIGNS
Simple Two-Way Interactions
Simple Simple Effects
Main Effect Contrasts and Interaction Contrasts
Simple Contrasts: Simple Comparisons, Simple Simple Comparisons, and Simple Interaction Contrasts
10 ANALYSIS OF COVARIANCE
Testing the Homogeneity of Regression Assumption
Multiple Covariates
Contrasts
Post Hocs
Multiple Between-Subjects Factors
ANCOVAs in Designs With Within-Subjects Factors
Constant Covariate
Varying Covariate
11 DESIGNS WITH RANDOM FACTORS
Random Factors Nested in Fixed Factors
Subjects as Random Factors in Within-Subjects Designs: The One-Line-per-Level Setup
The One-Factor Within-Subjects Design
Two-Factor Mixed Design
Using One-Line-per-Level Setup to Get Values to Manually Compute Adjusted Means in Varying Covariate Within-Subjects ANCOVA
12 MULTIVARIATE ANALYSIS OF VARIANCE: DESIGNS WITH MULTIPLE ,DEPENDENT VARIABLES TESTED SIMULTANEOUSLY
Basic Analysis of Variance Commands
Multivariate Planned Contrasts and Post Hocs
Extension to Factorial Between-Subjects Designs
Multiple Dependent Variables in Within-Subject Designs: Doubly Multivariate Designs
Contrasts in Doubly Multivariate Designs
13 GLM AND UNIANOVA SYNTAX
One-Factor Between-Subjects ANOVA
Basic Commands
Contrasts
Post Hoc Tests
Two-Factor Between-Subjects ANOVA
Unequal N
Main Effects Contrasts and Post Hocs
Simple Effects
Simple Comparisons
Interaction Contrasts
Three or More Factor ANOVA
One-Factor Within-Subjects ANOVA
Basic Commands
Planned Contrasts
Post Hoc Tests
Two or More Factor Within-Subjects ANOVA
Main Effect and Interaction Contrasts
Simple Effects and Simple Comparisons
Mixed Designs
More Complex Analyses
REFERENCES