Processing Output

When you've collected enough data using your web form, you're going to want to get all of that output into a spreadsheet-based piece of software so you can manipulate and analyze the data. The "brute force" method of doing this is to simply print out all those emails, database files, or what have you, and manually re-type the data into the program of choice. There are two reasons you should avoid this: (1) You increase the chance of data entry errors by re-typing, and, (2) It's boring, time-consuming, and unnecessary.

And, more importantly, this is one of the major advantages of web-based data collection -- the data you need are already "typed" into to a computer file by the user's answers, so there's no chance for data entry errors, and no need to waste time typing it in again, just because the data are in a form that can be directly read by your spreadsheet program.

At this stage you will either have a mailbox full of e-mails with data that look like this:

Age: 17
Sex: 1
Item01: 4
Item02: 5
Item03: 1
[...etc]

Or comma-delimited data from a web page, which will look like this:

17,1,4,5,1
[...etc.]

In either case, you'll want to "cut" and "paste" the data into a new word-processing file. Then your goal will be to use the word processing software to "clean up" any extraneous text so that you finally have a comma-delimited file. If you collected the data using the database method, you're already done, just save the word processing file as a text (.txt) file. You will import this later into your spreadsheet program. Done! (This is the significant advantage of collecting your data in a database file)

If you have data from email submissions of your form, let's say a form with 5 items and three respondents (yours will likely have many more of both, but this example should serve) your word processing file will look like this:

Original word processing file... ...which needs to be changed to a comma-delimited format looking like this:

Age: 17
Sex: 1
Item01: 4
Item02: 5
Item03: 1

Age: 18
Sex: 0
Item01: 3
Item02: 2
Item03: 6

Age: 20
Sex: 1
Item01: 3
Item02: 1
Item03: 1

17,1,4,5,1
18,0,3,2,6
20,1,3,1,1

(NOTE: This text file has a hard return at the end of each respondent's data, and starts data for each new respondent on a new line. Commas separate each value. There are NO SPACES between data entries)

If you had to do this conversion manually, there would be no benefit over just typing in the data directly to your spreadsheet program. This is where your word processor's "find/replace" function comes in.

Find/replace works like this: You tell your word processor to mindlessly "find" particular instances of text, and then instruct it to "replace" those instances with anything you choose (other text, empty spaces, commas, or what have you).

To get from the text on the left to the text on the right above, what do you need to delete?

  1. The variable names (age, item01, etc.).
  2. The ":" signs.
  3. The blank spaces.
  4. The hard returns after each variable.
  5. The double hard return after each respondent's data.

What do you need to add or preserve?

  1. A comma between each value.
  2. A hard return after each respondent's data.

Since you want to preserve a separation between respondents, begin by replacing what is there (a double hard return) with a marker character. In Wordperfect, select Edit>Find and Replace, and in the dialog box choose Match>Specific Codes and insert a double HRt (Hard Return) in the "find" text field, and replace it with a unique character of your choosing, like "~"

Here's the window:

When you run this on the file, it will produce a file that looks like:

Age: 17
Sex: 1
Item01: 4
Item02: 5
Item03: 1~Age: 18
Sex: 0
Item01: 3
Item02: 2
Item03: 6~Age: 20
Sex: 1
Item01: 3
Item02: 1
Item03: 1

What about getting rid of those single returns at the end of each value? Replace all single returns with a comma, since you eventually want to have commas, not single hard returns, to delimit values. Do this like so:

Which will produce a file that looks like this:

Age: 17,Sex: 1,Item01: 4,Item02: 5,Item03: 1~Age: 18,Sex: 0,Item01: 3,Item02: 2,Item03: 6~Age: 20,Sex: 1,Item01: 3,Item02: 1,Item03: 1

Now, this may seem tricky at first, and with three respondents it's in fact easier to do by hand. But with 20 respondents or 200, you can imagine how much time this will save you.

Now, all you've got to get rid of is those pesky variable names and colons and empty spaces, and you'll be left with a neat comma delimited file. In WordPerfect, you are stuck with "find and replace" for each variable name, and replace with nothing (leave the replace field blank). In MS Word, you can search for "Any Text Character," and, if you've had the foresight to use only text characters in your variable names, this will get rid of all of them in one step. When you delete all variable names, you'll be left with this file:

: 17,: 1,: 4,: 5,: 1~: 18,: 0,: 3,: 2,: 6~: 20,: 1,: 3,: 1,: 1

Now clean up all those blank spaces, by search for a single space and replacing with nothing (Do this several times to make sure all of them are gone). Then search for colons and replace with nothing. This will give you:

17,1,4,5,1~18,0,3,2,6~20,1,3,1,1

Now use that special character you inserted to reintroduce those single hard returns you want between respondents. Here's the dialog box again:

Which will give you:

17,1,4,5,1
18,0,3,2,6
20,1,3,1,1

Which is what you want. Imagine this happening with very large text files that have, say 100 respondents and 100 variables per respondent. You wouldn't want to do that by hand. Obviously your particular text file will have some specific idiosyncrasies so that you shouldn't mechanically follow the steps in this example; this should, however, give you ideas o fwhat's possible to end up with a "clean," error-free, comma-delimited data file.

Now you're ready to import into a spreadsheet or data analysis program. Save this file as a text (*.txt) file, give it a name you'll remember, and proceed to the next section.