u5AJAtPQA;=-;u]CBSBtXvZBt;t҉PBAJAtPQA4;$#YJAtPQAt;t܉HCAKCtXvu5AJBtPA;tf5t&;Su[CBSCt XBt;tʉPBAJBtPAtZ;uS*Lt&YJBtPAt;tՉHCAKAtXYAt;tHQ-[Ð&UVSu‹4 ADCuuKtGt %Mt&@jj jVD؉؍e[^UWVSM] 1;9~rAC8t )‰ s)[^_ÍvUWVS} UEEƋU9~Er=EExGx)t$)t }uUsɋEU9UtE)[^_UVSu@PHЉÃt@PVS讲1e[^ÐUVSMU 8s121&C9}AB8t[^ÐUUƒ tK t$R t t&Dt&hoh_hOh"?B<^wMƒ't""t-\u(h%h(h+h i% ! Ph.h B ÍvU WVSu@ EF11;}}47PPSBjSG;}|͸@ e[^_UWVSEEÉ8!t&<\CCx$4\AK\    "'14DWtT0CE9 p t&pDGt'CU<9t&tɉ6vCPЊ<wTC<wTCt&CA;E[^_UVSMU A1ۅu@ 9}ÍABKu[^Í&'UU1atit o%ÐU}s%ÍvUU1pt t%ÉU}m%ÍvU}n%ÍvU}%ÍvU}%ÍvU}%ÍvUE 8e%ÐU1}~E 8e%ÐU1}~E 8f%ÐUE H1tt lu luÐUE 19uWj`ЉC ` 9u6]) V/tT z5\ [t&] SE+` y T D@MLh   T ?]9G ` ЉET ?4 3MUE9vD 8IU+UT3TT3TT3RD3@ H LL.1`D3@T3TD3@T3H T @ T UMLh E  )\ Ee[^_ÉUuÐUuCÐUu uÉU,WVS` E)y EFT 6EE M~ h ‹D ) \  H 9vڍvIL9wvڍILt9rILT I<; 9uvT;.vTT;TLt;D@t@ T v< :1;D:u!@  :DD:@t @ T 8Ev~0;L usj`ЉC ` 9uR] M) L P`ЋT D:@L:LD:@L:Lt: @ )\ 5H [ h M) @ \ E ЋMT M<E}EHMT 9EEGE];Ms EA9rMtGBT EDh  M)@ \  ` Sv&t#UztPT MDHEDE%MMD E M Q tHe[^_ÐUUt:dt&t9Pu@Pd tR RÍvUuÐU WVS} uEPjC}uU R,&D tM QuVt&E+` y xT 4F} w=E PÃt(U RMQSӚuVtM  T Et9sB9R}t;)ΉtL;h J ` RC] 9tL L EPL U Rà uAH 9u V~) Wà V ST9] VMQS葘qvN9E vЉ9E =} WdÃu 17&E t E PuVSWVZ؍e[^_UVSuu VÃt VjS藙؍e[^ÉUu苖u1ÉU WVSuE H1ƃPÃu 1et&1ׅtRdt zuu-j bƒu S1ddZ)BÉ؍e[^_Ð&USl=lt Ѓ;u[UUS[k[$Id: mgquery.c,v 1.3 1994/10/20 03:57:02 tes Exp $Bad paragraph number. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-Time: Memory:Sizes: Disk: %s %s %-*s %s briefstatsinvftexttotaldict [stem]weightsdict [text]InvertedCompressed%7.3f Mbprocess mem%7.1f kBtotal [peak]P?%7dskipspointersaccumulatorstermsanswersindex lookups%7.1f kB (%3d seeks, %7d reads)P?verbatim)\(?%lu%%n%f%%w%7d.%-7d %6.4f %7d.%-7d 50heads_length---------------------------------- %n %w\nranked_doc_sepstr---------------------------------- %n\ndoc_sepstr\n######## PARAGRAPH %n ########\npara_sepstr***** Weight = %w *****\npara_start1048576bufferUnable to load compressed text.This is probably due to lack of memory.UDoc is unexpectedly NULL%d գ@terminatorpagerUnable to allocate memory for highlighting hilite_stylemg_hilite_words%s --style=%s --pager=%s --stem_method=%ld %swUnable to run "%s" No entries correspond to that query. %d documents match. %d documents retrieved. timestatsdiskstatsmemstatssizestats%d %s %d %s#%d %d textnamemgname./mgdirallmaxdocscasefoldstemmax_termsmax_accumulatorssorted_termsqfreqmaxparasAaccumulator_methodhash_tbl_sizestop_at_max_accumskip_dumpterm_freq %s/%s.text%s%s%s/%sDf:d:t:husage: %s [-D] [-f base name of collection] [-t base name of files for text] [-d data directory] [collection] expert FULL TEXT RETRIEVAL QUERY PROGRAM Aug 27 200321 Mar 1994%24s%s mgquery version 1.2, Copyright (C) 1994 Neil Sharman mgquery comes with ABSOLUTELY NO WARRANTY; for details type `.warranty' This is free software, and you are welcome to redistribute it under certain conditions; type `.conditions' for details. %swpB0wT$Id: locallib.c,v 1.3 1994/10/20 03:56:51 tes Exp $9B.?H.?$Id: lists.c,v 1.3 1994/10/20 03:56:50 tes Exp $$Id: query.ranked.c,v 1.4 1994/11/25 03:47:46 tes Exp $[%2d] %7.4f casefoldstemCould NOT create memory to add term%d Paragraphs were required to get %d documentsThe exact weights of all %d paragraphs had to be calculatedto obtain %d paragraphs. This may mean that the the documentsreturned do not necessarly represent an exact cosine ranking.This problem may be corrected by increasing 'maxparas'.Out of memory $Id: stem_search.c,v 1.3 1994/10/20 03:57:04 tes Exp $Could NOT create memory to add term$Id: environment.c,v 1.5 1995/03/14 05:15:26 tes Exp $onoffyesnotruefalseInvalid argument [true|false|yes|no|on|off] requiredNot a valid numberNot in legal range [%d <= num <= %d]allapprox-rankeddocnumsrankedbooleanInvalid argument [boolean|ranked|docnums|approx-ranked] requiredlisthash_tablesplay_treearrayInvalid argument [array|splay_tree|hash_table|list] requiredhiliteheadscountsilenttextInvalid argument [text|silent|docnums|heads|count|hilite] requiredunderlineboldInvalid argument [bold|underline] requiredmodeProblem in output type switchgp`wWPqueryProblem in query type switchhilite_stylebriefstatsdiskstatsexpertMGDATA.mgdirmgnamemaxdocsmemstatsPAGERmorepagerqfreqsizestatstimestatsverbatimsorted_termsaccumulator_methodstop_at_max_accum1048576buffer50000max_accumulatorsmax_terms1000maxparashash_tbl_sizeskips.%dskip_dump---------------------------------- %n %w\nranked_doc_sepstr---------------------------------- %n\ndoc_sepstr\n######## PARAGRAPH %n ########\npara_sepstr***** Weight = %w *****\npara_startterminator50heads_length1optimise_typecasefoldstemterm_freq$Id: commands.c,v 1.2 1994/09/20 04:41:22 tes Exp $ `on' or `off'. This parameter may take the values `yes', `no', `true', `false', document by the sequence "the". look for the sequence "and" followed somewhere later in the sequence "and.*the". If verbatim is `off', "and.*the" will E.G. If verbatim is `on', "and.*the" will look for the 8 character be considered a regular expression like in `vi' or `egrep'. be displayed. If verbatim is `off' the post-processing string will found the document will be displayed, if not the document will not in the documents just before they are displayed. If the string is with the query then the post-processing string will be searched for text. If verbatim is `on' and a post-processing strng is specified should attempt to do a regular expression match on the retrieved This is a boolean parameter that determines whether the program verbatim = `off' in both real time and CPU time. This parameter may take the If this is true then the time to process a query is displayed timestats = `false' would the `\n'. To include a `%' use the sequence `%%'. place special characters in the string. For example, a newline C escape character sequences (see the man page) may be used to document from the previous query has been output. The standard This specifies the string that will be output after the last terminator = `' values `yes', `no', `true', `false', `on' or `off'. no new accumulators are created. This parameter may take the of the current term. When this is false processing continues but is true the the processing of terms is stopped at the completion accumulators set by `max_accumulators' is reached. When this This specifies what should happen when the maximum number of stop_at_max_accum = `on' order of occurrence. This parameter may take the values `yes', is false the terms are not sorted and are instead processed in being done. When this is true the terms are sorted. When this occurring terms are processed first when ranked queries are decreasing occurrence in documents so that the least often This specifies whether of not the terms should be sorted into sorted_terms = `on' use `.unset skip_dump' to obtain optimal performance. skips during the query processing. This option is expensive; mgquery. This file will contain information about the usage of a `%d' in the file name will be replaced with the process id of directory. The name of the file is the value of this parameter, `hash_table', or `list` a file will be produced in the current inverted files when `accumulator_method' is set to `splay_tree', If this parameter is set then during ranked queries on skipped skip_dump = `skips.%d' `false', `on' or `off'. This parameter may take the values `yes', `no', `true', of each query indicating what went on during the query. If this is true then various numbers are output at the end sizestats = `false' document weight use the sequence `%w'. MG document number use the sequence `%n'. To include the `\n'. To include a `%' use the sequence `%%'. To include the characters in the string. For example, a newline would the sequences (see the man page) may be used to place special `approx-ranked' queries. The standard C escape character documents when they are displayed for `ranked' or ranked_doc_sepstr = `---------------------------------- %n %w\n' | num '-' num ; range : num | query range ; query : range or ranges separated by hyphens. numbers separated by spaces may be specified `docnums' allows the entry of document numbers. Multiple | query TERM ; query : TERM produces an approximation to full cosine ranking. precision document lengths, and therefore only cosine measure. `approx-ranked' uses only the low `ranked' and `approx-ranked' are for queries ranked by the | '(' or ')' ; term : TERM | '!' not ; not : term | not ; | and not and : and '&' not | and ; or : or '|' and query : or; The yacc grammar for boolean queries is as follows :- `boolean' is for boolean queries. `docnums' or `approx-ranked'. It can take four different values `boolean', `ranked', This specifies the type of queries that are to be specified. query = `boolean' `off'. may take the values `yes', `no', `true', `false', `on' or query term are assumed to occur only once. This parameter the query is used in the ranking. When this is `false' all When this is `true' the number of times a term appears in account the number of times each query term is specified. This determine whether the ranked queries will take into qfreq = `true' sequence `%w'. sequence `%%'. To include the paragraph weight use the example, a newline would the `\n'. To include a `%' use the may be used to place special characters in the string. For The standard C escape character sequences (see the man page) paragraphs for a paraghaph level index following a ranked query. This specifies the string that will be used at the head of para_start = `***** Weight = %w *****\n' the document use the sequence `%n'. use the sequence `%%'. To include the paragraph number within paragraphs. The standard C escape character sequences (see the para_sepstr = `\n######## PARAGRAPH %n ########\n' variable "PAGER" is defined then `pager' takes on that value. the help and the retrieved documents. If the environment This is the name of the program that will be used to display pager = `more' them. how many documents would be retrieved, but does not retrieve `Count' does the minimum amount of work required to determine `Heads` is used to print out the head of each document. This mode is intended to be used in timing experiments. displays nothing except how many documents were retrieved. document numbers. `Silent' retrieves all the documents but the contents of the document. `docnums' displays only the `docnums', `silent', `heads' or `count'. `text' displays are retrieved it may take four different values `text', This specifies how documents should be displayed when they mode = `text' This specifies the name of the MG database to process. mgname = `' initialised to the value in `MGDATA'. If the environment variable `MGDATA' is set then `mgdir' is This specifies the directory where the MG files may be found. mgdir = `.' after each query. This parameter may take the values `yes', This is a boolean parameter that determines whether the memory memstats = `off' take any value between 1 and 429467295 or the word `all'. be done after the terms have been sorted. This parameter may be discarded. If `sorted_terms' is on then the limiting will specified by max_terms are entered, then the extra terms will be used during a ranked query. If more terms than the number This parameter limits the number of terms that will actually max_terms = `all' between 8 and 268435456. `hash_table', or `list'. This parameter may take any value the parameter `accumulator_method` is set to `splay_tree', document numbers to be accumulated during ranked queries when This parameter limits the number of different paragraph/ max_accumulators = `50000' take on a numeric value between 1 and 429467295. maxdocs parameter will then be applied. This parameter may the final number of answers may be less that maxparas. The because some of the paragraphs may refer to the same documents identified the paragraphs are converted into documents, and query with paragraph indexing. After the paragraphs have been The maximum number of paragraphs to identify during a ranked maxparas = `1000' and 429467295 or the word `all'. query. This parameter may take on a numeric value between 1 The maximum number of documents to display in response to a maxdocs = `all' characters that will be output for each document. When the mode is `heads' this specifies the number of heads_length = `50' 268435456. size of the hash table and may take any value between 8 and table is a simple chained type. This parameter specifies the table to accumulate the weights for each document. The hash One of the options during ranking queries is to use a hash hash_tbl_size = `1000' spits out is suppressed. This parameter may take the values If this is true then a lot of the waffle that the program expert = `false' the sequence `%n'. use the sequence `%%'. To include the MG document number use string. For example, a newline would the `\n'. To include a `%' man page) may be used to place special characters in the queries. The standard C escape character sequences (see the documents when they are displayed for `boolean' or `docnums' This specifies the string that will be used to separate doc_sepstr = `---------------------------------- %n\n' `no', `true', `false', `on' or `off'. after each query. This parameter may take the values `yes', usage statistics for the preceding query will be displayed This is a boolean parameter that determines whether the disk diskstats = `off' in bytes. disk operations to be optimised. The buffer size is measured slight performance improvement because it allows the order of expanded automatically. Having a large buffer gives a very the documents are larger than this buffer the buffer is buffer of this size and then displayed from this buffer. If When the documents are being read in they are read into a buffer = `1048576' `yes', `no', `true', `false', `on' or `off'. "memstats" and "timestats". This parameter may take the values NOTE: this takes precedence over the parameters "diskstats", displayed. at the end of each query. totals for disk, memory and time usage statistics will be This is a boolean parameter that determines whether the the briefstats = `off' `list'. methods are available `array', `splay_tree', `hash_table', and weight for each document should be accumulated. The following This parameter is used during ranking, and specifies how the accumulator_method = `array' predefined and have special significance :- The following parameters (used in the .set and .unset commands) are be comments and are ignored. directory. Lines starting with a '#' in the .mgrc file are considered to first looks for .mgrc in the current directory and then in the users home of commands (NOTE: The .mgrc file may not contain any queries). mgquery On startup the mgquery program reads from the file .mgrc a sequence by sh. output of command, which is executed | command : The input comes from the standard file. < filename : Get the input from the specified .input arg - This is used to specify where input comes from. which is executed by sh. | command : The output is piped into command, >> filename : Append output to the specified file. > filename : Send output to the specified file. Arg may one be of the following: .output arg - This is used to specify where to send the documents. of parameters off the stack. .pop - destroys the current parameters and pops a new set .push - pushes the current parameters on to a stack. .display - displays the values of all the current parameters. .reset - sets all the parameters to their initial state. .unset name - deletes parameter "name" to true). will change to false, if it is false it will change parameter will be inverted (i.e. if it is true it is a boolean parameter and value is omitted the .set name value - sets parameter "name" to "value" . If the parameter .quit - quits the program. .help - displays this text. The following command are available :- mark is considered to the a post-processing pattern. operation. Any text between the first speech mark (") and the last speech identifies documents. The second part is a post-processing pattern matching A query consists of two parts. $Id: stemmer.c,v 1.3 1994/10/20 03:57:05 tes Exp $
%s
expert
Enter a command or query (.quit to terminate, .help for assistance).
%s %s
%s > 
Unable to allocate memory for the line
? Files required for level 2 and 3 inversion are missing
MG_BUFTOOSMALLL
Error reading "%s"
Bad magic number in "%s"
File "%s" not found
Out of memory
No error
%s
%s
rb
.t
.tdf
.td
.tda
.idb
.ib1
.ib2
.ib3
.i
.tiw
.wa
.ti
%s/%s.ip
Unable to open 'paraFile'.
Unexpected EOF while reading '%s'.
The compressed text buffer is NULL
No memory for TextBuffer
%d >= %d
)\(?
Accessing the hash_table through HT_find after HT_sorting
invf_get.c
Unexpected EOF in "%s" on line %d
%s Skipping method %ld
Skipping is every %ld docnums
Max nodes = %ld
No skips smaller or equal to %ld
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
%3d : %8ld %6ld "%.*s"
%*s
%4d
Unable to allocate memory for a splay node
%6d %6d %6d %6d
%12.2f
$Id: term_lists.c,v 1.1 1994/10/20 03:57:07 tes Exp $
Unable to allocate term list
Unable to resize term list
Could NOT create memory to add term
we->word_num = %d
we->count = %ld
we->doc_count = %ld
we->max_doc_count = %ld
we->invf_ptr = %ld
we->invf_len = %ld
Term Entry
te->Count = %d
te->Word = %s
te->Stem = %s
te->require_match = %i
Term List
tl->list_size = %d
tl->num = %d
[%d]
$Id: bool_tree.c,v 1.2 1995/03/14 05:15:24 tes Exp $
Can not create bool_tree_node for boolean query
PrintBinaryOp
bool_tree.c
( (tree)->tag==N_and || (tree)->tag==N_or || (tree)->tag==N_or_terms || (tree)->tag==N_not || (tree)->tag==N_diff)
%c
PrintUnaryOp
(term %d
TRUE
FALSE
)
No memory for bool tree copy
%s
syntax error
yacc stack overflow
$Id: bool_optimiser.c,v 1.4 1995/07/27 04:54:55 tes Exp $
DNF infinite loop
Unexpected "all" node in the parse tree.
Unexpected "not" node in the parse tree.
Original Number = %d
Final Number = %d
optimise_type
Unable to allocate query term list
Unable to resize query term list
Could NOT create memory to add term
$Id: memlib.c,v 1.1 1994/08/22 00:24:47 tes Exp $
$Id: huffman.c,v 1.1 1994/08/22 00:24:44 tes Exp $
$Id: messages.c,v 1.1 1994/08/22 00:24:48 tes Exp $
: %s
%s
%s
$Id: bitio_gen.c,v 1.1 1994/08/22 00:24:38 tes Exp $
Error: Cannot unary encode %lu
Error: Cannot binary encode %lu
Error: Cannot gamma encode %lu
Error: Cannot bblock encode %lu
$Id: filestats.c,v 1.1 1994/08/22 00:24:42 tes Exp $
$Id: sptree.c,v 1.1 1994/08/22 00:24:50 tes Exp $
$Id: local_strings.c,v 1.2 1994/07/05 01:17:15 tes Exp $
\n
\b
\f
\t
\\
\"
\'
\%03o
$Id: stem.c,v 1.2 1994/09/20 04:20:44 tes Exp $
$Id: timing.c,v 1.1 1994/08/22 00:24:53 tes Exp $
%02.0f:%02.0f:%05.2f cpu, %02d:%02d:%02d elapsed.
%02.0f:%02.0f:%05.2f
#######################################################################
#
# 1 Word number.
# 2 Number of time the word occurs in the collection.
# 3 Number of documents the word occurs in.
# 4 The word.
# 5 Number of docnum/word_count entries decoded.
# 6 Number of hits.
# 7 Number of entries in the splay tree or hash table.
# 8 Amount added to the accumulators while while processing this word.
#
#######################################################################
#######################################################################
#
# 1 Word number.
# 2 Number of time the word occurs in the collection.
# 3 Number of documents the word occurs in.
# 4 The word.
# 5 Number of skips taken in evaluating the inverted file entry.
# 6 Number of docnum/word_count entries decoded.
# 7 Number of hits.
# 8 Number of entries in the splay tree or hash table.
# 9 Amount added to the accumulators while while processing this word.
#
#######################################################################
#######################################################################
#
# 1 Word number.
# 2 Number of time the word occurs in the collection.
# 3 Number of documents the word occurs in.
# 4 The word.
# 5 Skip size.
# 6 Number of skips taken in evaluating the inverted file entry.
# 7 Number of docnum/word_count entries decoded.
# 8 Number of hits.
# 9 Number of entries in the splay tree or hash table.
# 10 Amount added to the accumulators while while processing this word.
#
#######################################################################  