============================================================================ Outdated since the new SRNGGM package has become available. The only reson for keeping this SRNG package is to provide the block weight feature, which has always been rather experimental. With SRNGGM You can emulate the block weight feature by designing a suitable metric that retains possible block structure in the data vectors for obtaining better relevance factor adaptation. If You have not the slightest idea what the historic block feature is, take the SRNGGM package. ============================================================================ Research Group: Learning with Neural Methods on Structured Data (LNM) Department of Mathematics and Computer Science, University of Osnabrück, Germany Minidoc for srng package ======================== Fri Feb 22 15:11:49 2002 Intro Learning Vector Quantization (LVQ) is a prototype based supervised neural method for data classification. Extensions are LVQ 2.1, GLVQ and GRLVQ. Supervised relevance neural gas (srng) is a combination of GRLVQ and the neural gas method of Ritter and Martinetz. This package provides an implementation of srng that might be used for experimental purposes and research activity, only. Don't use the implementation for any security related tasks. The algorithmic background of supervised relevance neural gas (SRNG) can be found in: Barbara Hammer, Marc Strickert and Thomas Villmann: Learning Vector Quantization for Multimodal Data. (http://www.inf.uos.de/lnm) Have fun, Marc Strickert (email: mstricke@uos.de) Overview of this manual: 1. Requirements 2. Quick start 3. srng executable 3.A. TRAIN MODE 3.B. TEST MODE 4. Exemplary srng initfile 5. Format of the data files 6. awk - helper files Requirements ============ The contents of this package has been tested for + Win2K/cygwin (1.3.3) + Linux (SuSE 7) Standard UNIces environments should work, too. Below, visualization refers to the gnuplot program; if You don't have it, just apply Your favorite plotter to the output files. Quick start =========== Type: $> make # produces srng executable $> make test # produces chkbrd_small_cb_sort.dat $> gnuplot gnuplot> load 'chkbrd_small.plt' # plot test data and protoype trajectories Other make targets are: make test_h10 # 10-dim. Data set. gnuplot -> load 'h_10data.plt' make mushroom_train_test # invokes trainer.awk in test mode. Watch output. make mushroom_train # invokes trainer.awk in training mode The most instructive way to get used to the package is 1) looking at the makefile 2) looking at the headers of the awk scripts 3) looking at the srng.[ch] files in an editor with un*x file format support. srng executable =============== A. TRAIN MODE A typical invocation of the srng executable looks like this: > cat chkbrd_small.dat|./srng chkbrd_small_init.dat 250 5 > chkbrd_small_pt.dat srng *always* reads the training data from the standard input channel, here, provided by the cat command. In the above case, chkbrd_small_init.dat is taken as inititialization file (see description below), a number of 250 cycles, i.e. number of presentations of the whole trainig file, is used, and each 5th cycle produces a snapshot of parameters, prototypes, and metric weights. Equivalently, > cat chkbrd_small.dat chkbrd_small_init.dat|./srng - 250 5>chkbrd_small_pt.dat the training data can be concatenated with the initialization file, where the '-' indicates that the initialization has to be read from standard input. B. TEST MODE The output of srng can be used for testing: > cat chkbrd_small_tst.dat chkbrd_small_pt.dat|./srng - 0 | grep '^-1 ' This call assumes a test set in chkbrd_small_tst.dat and a corresponding trained chkbrd_small_pt.dat. 0 cycle indicates test mode -1 cycle indicates test mode with elimination of prototypes with bad (<=50%) classification accuracy. The grep filters the line containing the overall accuracy on the test set. Removing grep provides information about the correct classification of each prototype. NaN (not a number) is displayed for prototypes with empty receptive fields. The command > cat chkbrd_small_tst.dat chkbrd_small_pt.dat|./srng - 0 42 | grep '^-1 ' prints the 42th record of the chkbrd_small_pt.dat prototypes. After stripping the preceding prototypes statistic lines these lines can be used as initialization file for further training. The general appearance of the initialization file is given now. Exemplary srng initfile ======================= (comments) below in paretheses must not appear in the original file. **************** chkbrd_small_init.dat ************************************** (i) 9(number_prototypes) 0.25(coord-adapt-rate) 1e-6(lambda-adapt-rate) 0.0(lambda-weight-decay) 0.95(neighborhood-decay-rate) -1(neighborhood-inititial-size) 0.5(ng2grlvq-fader) 0(random-seed) (p) 0.655123 0.572721 1 (first prototype for class 1) (p) 0.0654028 0.287699 1 (p) 0.486939 0.462719 1 (p) 0.124859 0.431446 1 (p) 0.908377 0.998554 1 (p) 0.815472 0.159003 0 (prototype order doesn't matter) (p) 0.252102 0.160793 0 (class numbers should start with 0) (p) 0.211522 0.975037 0 (class numbering should not contain gaps) (p) 0.653517 0.400613 0 (l) -1 (m) ***************************************************************************** (i) is the parameter initialization. number_prototypes: tell that this number of following lines are prototypes coord-adapt-rate: is usually between 0.0001 and 0.5 lambda-adapt-rate: is usually between 0.0 and about coord-adapt-rate/1000 lambda-weight-decay: is experimental between 0 and about 1e-6 neighborhood-decay-rate: controls shrinking of neighborhood size per cycle neighborhood-inititial-size: 0 means number of prototypes which seems useful ng2grlvq: how much neural gas influence (0.0 means max) 0.5 often ok (p) prototype lines (l) lambdas / metric weights (one or dimension of data) -1 means initialize all dimensions equally. Above example: -1 => 0.5 0.5 after training, this line can be used for analysis of dimension relevances. (m) optional mask line for weight adaptation values in [0,1] indicate a weighting of lambdas adaptation. A 0 entry says: 'Don't adapt the corresponding weight. Keep it fixed.' A number > 1 must be integer and indicates a length of a block. For example, for 5-dimensional data entries 2 3 (summing up to dimension) are interpreted as 1/(2*5) 1/(2*5) 1/(3*5) 1/(3*5) 1/(3*5), which might be interesting for multi dimensional features. Note that in the output also the trivial case 1 ... 1 is printed, which in a result prototype file containing many records should not be changed for not confusing the srng test mode. Format of the data files ======================== The first line of a data file contains #number_of_lines dimension dimension excludes the class label, hence there are dimension+1 columns. Feature vectors are arranged as space (ascii 32) separated real numbers. The class label is in the last column. Class labels start with 0 and numbering must not contain gaps. Example from chkbrd_small.dat: #199 2 0.0833333 0.458333 0 0.0833333 0.5 0 0.0833333 0.541667 0 [...] 0.916667 0.833333 1 0.916667 0.875 1 awk - helper files ================== augment.awk generate a training file with equal class distribution cnorm.awk normalize each column to interval [0,1] discrete2unary.awk generate a real data vector representation of symbolic protosfromdata.awk obtain prototypes from training file shuffle.awk shuffles lines in a (data) file toinput.awk converts a data table into the correct srng format trainer.awk a center for batch mode processing (x-validation, etc) ztrans.awk calculate a z-transform for columns of given data ztrans_inv.awk calculate a inverse z-transform for columns of given data See file headers. #EOF