[English | Japanese]
This tutorial is for users who begin using Namazu 2.0.
make
make install
This tutorial is written for
in order to reduce the workload when using Namazu. Please refer manual to learn all features in Namazu. Also, installation guide is given in INSTALL file.
History of Namazu development from 1.3.0.x through 2.0 is as follows.
Namazu consists of three major components, mknmz, namazu, namazu.cgi.
You need the following softwares to build Namazu 2.0.
Name | Description | Status | Current Version | Required Version | File name | Development and Distribution | Sources(Example) | Others |
---|---|---|---|---|---|---|---|---|
Perl | Perl Language | Required | 5.8.8 | >= 5.004 | perl5.005_03.tar.gz | Larry Wall GNU CPAN | CPAN | |
make | maintain groups of programs | 3.81 | make-3.81.tar.gz | FSF | GNU | Required, when it cannot compile by make of system attachment. | ||
gettext | translate message | Required only because of a multi-language message. | 0.14.6 | >= 0.13.1 | gettext-0.14.6.tar.gz | FSF | GNU | Solaris is indispensable. |
nkf | Network Kanji Filter | for Japanese processing only | 2.0.7 | >= 1.71 | nkf207.tar.gz |
Shinji Kono Rei FURUKAWA |
nkf_utf8 | avoid using version 1.90, 1.92, 2.0.0 - 2.0.3 (See notes) |
NKF | nkf Perl Module | for Japanese processing only. ++ | 2.0.7 | >= 1.71 | ||||
KAKASI | Japanese/Romaji Conversion | for Japanese processing only. ** | 2.3.4 | >= 2.x | kakasi-2.3.4.tar.gz | KAKASI Project | namazu.org | |
Text::Kakasi | KAKASI Perl Module | for Japanese processing only. ++ | 2.04 | >= 1.05 | Text-Kakasi-2.04.tar.gz | NOKUBI Takatsugu Dan Kogai |
CPAN dist | |
ChaSen | (ChaSen) -- Japanese Morphology Analyzer | for Japanese processing only. ** | 2.3.3 | >= 2.0x | chasen-2.3.3.tar.gz | Nara Institute of Science and Technology | Distribution Policy | For libchasen.a in ChaSen 2.02 or earlier, refer below. |
Text::ChaSen | ChaSen Perl Module | for Japanese processing only. ++ | 1.03 | <= | Text-ChaSen-1.03.tar.gz | NOKUBI Takatsugu | Text::ChaSen | |
MeCab | Yet Another Japanese Morphology Analyzer | for Japanese processing only. ** | 0.93 | >= 0.6 | mecab-0.93.tar.gz | Taku Kudo | MeCab | from Namazu 2.0.15 (It corresponds since Namazu 2.0.16 since MeCab 0.90.) |
mecab-perl | MeCab Perl Module | for Japanese processing only. ++ | 0.93 | >= 0.76 | mecab-perl-0.93.tar.gz | Taku Kudo | MeCab | from Namazu 2.0.15 (It corresponds since Namazu 2.0.16 since MeCab 0.90.) |
File::MMagic | File Type | Included | 1.27 | >= 1.20 | File-MMagic-1.27.tar.gz | NOKUBI Takatsugu | CPAN dist | This is packaged in Namazu distribution. |
perl
Makefile.PL; make; make install
. We recommend to
install Perl modules, unless you have particular difficulties in doing so.
(Notes listed below are for Japanese processing only.)
If you have everything ... | For segmentation, KAKASI is used by default, however, ChaSen can be used by specifying -c option. MeCab can be used by specifying -b option. |
If you have one or more ... | When executing ./configure, Namazu selects which one to use. (KAKASI can be used by specifying -k option. ChaSen can be used by specifying -c option. MeCab can be used by specifying -b option.) |
make install
does not install
/usr/local/lib/libchasen.a automatically. So to build
perl ChaSen module, you will need to do
cp libchasen.a /usr/local/lib ranlib /usr/local/lib/libchasen.a # depending on your systemmanually.
Since 2.0.6, the handling of environment variables was changed. Besides, new command line option was added in mknmz.
To use Namazu 2.0 under Japanese environment, you may need to set up environment variables for language selection.
With 2.0.5 (or earlier), the same environment variables were used to switch for both message translations and internal text processing.
Message translations | LANGUAGE | LC_ALL | LC_MESSAGES | LANG |
Text processing | LANGUAGE | LC_ALL | LC_MESSAGES | LANG |
With 2.0.6, We modified as follows.
Message Translations | LANGUAGE | LC_ALL | LC_MESSAGES | LANG |
Text processing | LC_ALL | LC_CTYPE | LANG |
The typical example to process Japanese is to set following values, depending on your system environment.
Unix OS | ja |
Windows | ja_JP.SJIS |
The actual command to set value show above may again depend your shell,
C shell | Bourne shell etc |
setenv LANG ja |
LANG=ja; export LANG |
With above example, value(ja) is set for LANG,
and all the processing will be for Japanese.
Some system may require
ja_JP
, ja_JP.eucJP
,
ja_JP.EUC
, ja_JP.ujis
instead of just ja
.
If the variables are not properly set when mknmz is executed, the resulting index files are not in good shape. If you browse one of the file, NMZ.w, supposed to have one (Japanese) word per line, instead, you have long sentence not segmented in each line. In that case, namazu or namazu.cgi execution will not show you the correct results.
Since 2.0.6, the --indexing-lang=LANG
option has
been added in mknmz command.
You can specify language-processing-type with the option
like --indexing-lang=ja
(command line option given overrides environment variable).
If you wish to test mknmz
before make
install
, do
cd namazu-2.0.x
( ... where you have unpacked *.tar.gz)
env pkgdatadir=`pwd` scripts/mknmz
(in case csh/tcsh)
or
pkgdatadir=. scripts/mknmz
(in case with sh/bash).
These will refer adjacent
pl,filter,template
etc, not exisiting stuff under
/usr/local/share/namazu
etc).
(To know more about this, see $PKGDATADIR variable in mknmz etc.)
You may try following examples for the first time to see the configuration, help, and to generate indexes for ~/Mail stuff, respectively.
./mknmz -C ./mknmz --help ./mknmz -O /tmp ~/Mail
If you just type mknmz
or namazu
with no argument, a short usage will be displayed. If you
feed --help
as an argument, a long usage will
be displayed. The option -C
will display the
configurations at the time. Useful to remember these 3
option usages.
Argument | Meaning | Other Arguments |
---|---|---|
None | Short Usage | Cannot add any argument |
--help | Long Usage | Ignores other arguments |
-C | Configurations | Other arguments will have meanings. |
First, create index.
(If you wish to run mknmz before make install
, please see
Test before
mknmz make install)
Format are changed slightly from versions 1.4.0.8.
URI replacement is dealt with by specifying
--replace option.
URI replacement can be done during namazu/namazu.cgi
execution. In this case, run mknmz without --replace option,
and setup .namazurc so
that URI replacement is performed during namazu/namazu.cgi
execution.
Run mknmz as follows.
mknmz [options] target directory
The above example creates index in the current directory.
Use -O
option to specify the output directory.
For example,
mkdir /tmp/index mknmz -O /tmp/index \ --replace='s#/foo/bar/doc/#http://foo.bar.jp/software/#' \ /foo/bar/doc
mknmz will output the following messages during the creation of index. If you wish to display messages in Japanese, please refer to Japanese Environment.
14 files are found to be indexed. 1/14 - /foo/bar/acrobat3.pdf [application/pdf] 2/14 - /foo/bar/excel97.xls [application/excel] 3/14 - /foo/bar/html.html [text/html] 4/14 - /foo/bar/mail-multipart.txt [message/rfc822] 5/14 - /foo/bar/mail.txt [message/rfc822] 6/14 - /foo/bar/man.1 [text/x-roff] 7/14 - /foo/bar/msg00000.html [text/html; x-type=mhonarc] 8/14 - /foo/bar/plain.txt [text/plain] 9/14 - /foo/bar/plain.txt.Z [text/plain] 10/14 - /foo/bar/plain.txt.bz2 [text/plain] 11/14 - /foo/bar/plain.txt.gz [text/plain] 12/14 - /foo/bar/rfc0000.txt [text/plain; x-type=rfc] 13/14 - /foo/bar/tex.tex [application/x-tex] 14/14 - /foo/bar/word97.doc [application/msword] Writing index files... [Base] Date: Thu Mar 16 22:14:01 2000 Added Documents: 14 Size (bytes): 58,701 Total Documents: 14 Added Keywords: 95 Total Keywords: 95 Wakati: module_kakasi -ieuc -oeuc -w Time (sec): 14 File/Sec: 1.00 System: linux Perl: 5.00503 Namazu: 2.0.X
/foo/bar/doc
This means "documents under /foo/bar/doc/
will appear as
http://foo.bar.jp/software/
, so please perform replacement like s#aaa#bbb# if written in Perl."
(In this example, (aaa) corresponds to (/foo/bar/doc/) and (bbb) corresponds to (http://foo.bar.jp/))
Namazu was originally developed for processing HTML documents, Namazu can now deal with various document styles. You will find useful scripts in /usr/local/share/namazu/filter, and detailed explanation will be found in Document filters in Namazu manual.
% mknmz ~/Mail/foobar
For mknmz command-line arguments, you get usage information from mknmz --help. With -C option, you get the configurations of the time.
Loaded rcfile: /home/foobar/.mknmzrc System: linux Namazu: 2.0.X Perl: 5.00503 File-MMagic: 1.27 NKF: module_nkf KAKASI: module_kakasi -ieuc -oeuc -w ChaSen: module_chasen -i e -j -F "%m " MeCab: module_mecab -Owakati -b 8192 Wakati: module_kakasi -ieuc -oeuc -w Lang_Msg: C Lang: C Coding System: euc CONFDIR: /usr/local/etc/namazu LIBDIR: /usr/local/share/namazu/pl FILTERDIR: /usr/local/share/namazu/filter TEMPLATEDIR: /usr/local/share/namazu/template Supported media types: (42) Unsupported media types: (2) marked with minus (-) probably missing application in your $path. application/excel: excel.pl application/gnumeric: gnumeric.pl application/ichitaro5: taro56.pl application/ichitaro6: taro56.pl application/ichitaro7: taro7_10.pl application/macbinary: macbinary.pl application/msword: msword.pl application/pdf: pdf.pl application/postscript: postscript.pl application/powerpoint: powerpoint.pl application/rtf: rtf.pl application/vnd.kde.kivio: koffice.pl application/vnd.kde.kpresenter: koffice.pl application/vnd.kde.kspread: koffice.pl application/vnd.kde.kword: koffice.pl application/vnd.oasis.opendocument.graphics: ooo.pl application/vnd.oasis.opendocument.presentation: ooo.pl application/vnd.oasis.opendocument.spreadsheet: ooo.pl application/vnd.oasis.opendocument.text: ooo.pl application/vnd.sun.xml.calc: ooo.pl application/vnd.sun.xml.draw: ooo.pl application/vnd.sun.xml.impress: ooo.pl application/vnd.sun.xml.writer: ooo.pl application/x-apache-cache: apachecache.pl application/x-bzip2: bzip2.pl application/x-compress: compress.pl - application/x-deb: deb.pl - application/x-dvi: dvi.pl application/x-gzip: gzip.pl application/x-js-taro: taro7_10.pl application/x-rpm: rpm.pl application/x-tex: tex.pl application/x-zip: zip.pl audio/mpeg: mp3.pl message/news: mailnews.pl message/rfc822: mailnews.pl text/hnf: hnf.pl text/html: html.pl text/html; x-type=mhonarc: mhonarc.pl text/html; x-type=pipermail: pipermail.pl text/plain text/plain; x-type=rfc: rfc.pl text/x-hdml: hdml.pl text/x-roff: man.pl
short name | long name | description |
---|---|---|
-F | --target-list=FILE | read in list of target files for index creation |
-t | --media-type=MTYPE | specify the document format of target files |
--allow=PATTERN | specify the regular expression of target file names. | |
--deny=PATTERN | specify the regular expression of to-be-excluded file names. | |
--exclude=PATTERN | specify the regular expression of to-be-excluded path names. |
To search documents, do
% namazu query index
If you omit index, namazu will assume
/usr/local/var/namazu/index
as target.
Set up for namazu
command will be done in
namazurc
.
An example of namazurc can be found in
/usr/local/etc/namazu/namazurc-sample
in Namazu
distribution package.
To use CGI on the web, you need to do various configuration. For Apache (Configuration)
ScriptAlias | /cgi-bin/ /usr/local/apache/cgi-bin/ | directory alias to /cgi-bin/ in URI |
AddHandler | cgi-script .cgi | execute cgi for files ending with ".cgi" |
AllowOverride | All | Allow .htaccess configuration (Web administrator) |
Options | ExecCGI | Allow cgi-bin execution
|
DirectoryIndex | index.html | file name to display when specifying directory in URI |
.htaccess
can do configurations other than the one
indicated by (Web administrator). (Note that these
configuration may be forbidden in Apache configuration.)
What is written here is not "guarantee". Just introduce the advanced usage that developers have in mind.
(Preparation) (Search display) mknmz namazu ^ | ^ | | v | v Original Document Index Search ResultNamazu prepares index of words in prior to the search request, and upon request, Namazu searches the document based on the prepared index. This "prepared index" is called index. In Namazu, NMZ.* are the index.
Index, Replace, Logging, Lang, Template
For further detail, see
Manual
perl -MText::Kakasi -e '' perl -MText::ChaSen -e '' perl -MMeCab -e '' perl -MNKF -e ''You can take advantage of Perl modules if nothing is displayed. If you then do ./configure in namazu, these Perl modules will be used.