Breaks up a text structure into individual words.
Option
SEPARATOR = text |
Defines the characters separating the words in the original text; default ' ,;:.' |
|---|
Parameters
TEXT = texts |
Text to break into words |
|---|---|
WORDS = texts |
Saves the words contained in each text (in the order in which they occur) |
COLUMNS = variates |
Saves the number of the column in the TEXT where each word began |
LINES = variates |
Saves the number of the line where each word was found |
PLACESINLINES = variates |
Saves the place of each word (first, second &c) within the line where it was found |
Description
The TXBREAK directive forms a text containing all the words (including duplicates) found in a text. The original text to break up is supplied by the TEXT parameter, and the WORDS parameter saves a text storing the words that it contains. The words are stored in the order in which they occur in the original text (but, for example, you could use the SORT directive to sort them into alphabetic order). The LINES parameter can save a variate recording the line in the original text where each one was found. The COLUMNS parameter can save a variate recording the column where each word began, and the PLACESINLINES parameter can save a variate giving the place of each word (first, second &c) within the line where it was found.
By default, the words are assumed to be separated from one another by spaces or by any of the standard punctuation characters (comma, semi-colon, colon, full stop). However, you can use the SEPARATOR option to specify some other characters. For example, you could put SEPARATOR=' ,;:.?' to allow question marks as well. These characters are all removed from the words when they are stored.
Option: SEPARATOR.
Parameters: TEXT, WORDS, COLUMNS, LINES, PLACESINLINES.
Action with RESTRICT
TXBREAK takes account of any restrictions on the original text, and omits the words in the restricted lines.
See also
Directives: TEXT, CONCATENATE, EDIT, TXCONSTRUCT, TXFIND, TXPOSITION, TXREPLACE.
Procedure: TXSPLIT.
Functions: CHARACTERS, GETFIRST, GETLAST, GETPOSITION, POSITION.
Commands for: Calculations and manipulation.
Example
"Example 1:4.7.3, 1:4.7.4 and 1:4.7.6"
TEXT Intro6; VALUES=!t(\
'Genstat has very comprehensive facilities for Analysis of Variance.',\
'Almost all of these can be accessed using custom menus. In this',\
'chapter, we start with the simplest design, a one-way completely',\
'randomized experiment, before introducing factorial experiments,',\
'which have more than one treatment or fixed effect. We use an',\
'experiment with a randomized block design to show how to deal with',\
'blocks, which involve more than one stratum or source of error in',\
'the analysis, and extend this idea by analysing a split-plot design.',\
'Many other types of design can also be analysed by Genstat, and',\
'details are available in Chapter 4 of Part 2 of the Guide to',\
'Genstat. We also introduce some of Genstat''s extensive facilities',\
'for creating designed experiments, available from the Design option',\
'of the Stats menu.')
TXPOSITION Intro6; SUBTEXT='Genstat'; POSITION=Where
TXPOSITION Intro6; SUBTEXT='Genstat'; POSITION=Next; SKIP=Where
PRINT Where,Next; DECIMALS=0
TXFIND [DISTINCT=left,right] Intro6; SUBTEXT='the';\
COLUMN=column; LINE=line
PRINT [SQUASH=yes] line,column & Intro6$[line] & '!'; FIELD=column
FOR [NTIMES=999]
TXFIND [DISTINCT=left,right] Intro6; SUBTEXT='the';\
COLUMN=column; LINE=line; ICOLUMN=column+1; ILINE=line
EXIT line .EQ. 0
PRINT [SQUASH=yes] line,column & Intro6$[line] & '!'; FIELD=column
ENDFOR
TXBREAK Intro6; WORDS=Words
GROUP [CASE=ignored; REDEFINE=yes] Words
TABULATE [PRINT=count; classification=Words]