| Commit message (Collapse) | Author | Age |
... | |
|\
| |
| |
| |
| |
| |
| | |
Conflicts:
inc/fulltext.php
inc/indexer.php
lib/exe/indexer.php
|
| |
| |
| |
| |
| | |
This makes it possible to find words that include soft-hyphens. However,
search higlighting will not work and I have no idea how to make it work.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
p_get_metadata has a $render parameter that has been disabled by the
restructuring of metadata rendering. This change reactivates it so
rendering metadata can be prevented. This is e.g. used in the search and
in some plugins like indexmenu that use p_get_first_heading. The default
of the parameter has been changed to true as otherwise the new caching
structure won't work as almost all calls to p_get_metadata don't set the
$render parameter.
The indexer call to p_get_first_heading has been changed to set $render
to true as in the indexer only one page will be rendered and the title
in the index should really be the current one.
This does not fix the problem that rendering pages with lots of links or
displaying the index can cause the parsing/rendering of a lot of pages.
|
| |
| |
| |
| |
| |
| | |
As of VIM 7.3 it is no longer possible to specify the encoding in the
modeline. This gives an error message whenever such a file is opened,
thus this commit removes the enc setting from the modeline.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
index.
|
| | |
|
| | |
|
|\ \ |
|
| | |
| | |
| | |
| | | |
result
|
| | | |
|
|/ /
| |
| |
| |
| |
| | |
This allows plugins to add their own version strings like
plugin_tag=1 so pages can be reindexed when plugins update their index
content.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
INDEXER_METADATA_INDEX event
This new event allows plugins to add or modify the metadata that will be
indexed. Collecting this metadata in an event allows plugins to see if
other plugins have already added the metadata they need and leads to
just one single indexer call thus fewer files are read and written.
Plugins could also replace/prevent the metadata indexer call using this
event.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes addMetaKeys so it actually removes values. This also changes
the functionality of the function: It now updates the key for the page
with the current value instead of adding new values as this will be the
default use case. A new parameter could be added to restore the "old"
behavior when needed.
addMetaKeys now only saves the index when the content has really been
changed.
Furthermore no empty number is added anymore to the reverse index when
it has been empty previously.
addMetaKeys now releases the lock again and really fails when the lock
can't be gained.
|
| |
| |
| |
| |
| | |
Saving and looking up metadata key/value pairs seems to work now at
least with some basic tests.
|
| |
| |
| |
| |
| |
| |
| | |
Now _saveIndexKey inserts empty lines when the index isn't long enough.
This is necessary because the page ids are taken from the global page
index, but there is not every page in the metadata key specific index
so e.g. line 10 might be the first entry in the index.
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The indexer functions have been converted to a class interface.
Use the Doku_Indexer class to access the indexer with these public methods:
addPageWords
addMetaKeys
deletePage
tokenizer
lookup
lookupKey
getPages
histogram
These functions are provided for general use:
idx_get_version
idx_get_indexer
idx_get_stopwords
idx_addPage
idx_lookup
idx_tokenizer
These functions are still available, but are deprecated:
idx_getIndex
idx_indexLengths
All other old idx_ functions are unsupported and have been removed.
|
|\ \ |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
An external tokenizer inserts extra spaces to mark words in the input text.
The text is sent through STDIN and STDOUT file handles.
A good choice for Chinese and Japanese is MeCab.
http://sourceforge.net/projects/mecab/
With the command line 'mecab -O wakati'
|
| | | |
|
| | | |
|
| | | |
|
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When updating a single line that line was split into an array and in a
loop over that array one entry was removed and afterwards a new one
added. Tests have shown that using a regex for doing that is much faster
which can be easily explained as that regex is very simple to match
while a loop over an array isn't that fast. As that update function is
called for every word in a page the impact of this change is
significant.
|
| |
| |
| |
| |
| |
| |
| |
| | |
This adds a simple boolean variable that tracks if new words have been
added. When editing a page in many cases all words have already been
used somewhere else or just one or two words are new. Until this change
all words indexes read were always written, now only the changed ones
are written. The overhead of the new boolean variable should be low.
|
| |
| |
| |
| |
| |
| |
| | |
In PHP versions newer than 4.3.0 fgets reads a whole line regardless of
its length when no length is given. Thus the loop in _freadline isn't
needed. This increases the speed significantly as _freadline was called
very often.
|
|/
|
|
|
|
|
|
| |
From my experience with a benchmark of the indexer it is faster to first
join the array of all index entries and then write them back together
instead of writing every single entry. This might increase memory usage,
but I couldn't see a significant increase and this function is also only
used for the small index files, not for the large pagewords index.
|
| |
|
| |
|
|
|
|
|
|
| |
Each searches on the wiki use this function. Scanning the index directory eachtime is time consuming with a constant series of disk access.
Switching a normal search to use file_exists 1 or more times, and not readdir all the directory.
Switching a wildcard search to use a lengths.idx file containing all the word lengths used in the wiki, file generated if a new configuration parameter $conf[readdircache] is not 0 and fixed to a time in second. Creation of a new function idx_listIndexLengths to do this part.
|
|
|
|
|
|
|
|
| |
Classes are loaded throug PHP5's class autoloader, all other
includes are just loaded by default. This skips a lot of
require_once calls.
Parser and Plugin stuff isn't handled by the class loader yet.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changes of behaviour are:
* Allow the user name, title & description \e2\80\9c0\e2\80\9d
* Default to Port 443 if using HTTPS
* Set $INFO['isadmin'] and $INFO['ismanager'] to \e2\80\9cfalse\e2\80\9d even if no user is
logged in
* Do not pass empty fragment field in the event data for event
ACTION_SHOW_REDIRECT
* Handle chunked encoding in HTTPClient
darcs-hash:20091104100115-e4919-5cf6397d4a457e3f98a8ca49fbdab03f2147721d.gz
|
|
|
|
|
|
| |
Ignore-this: 259cb5773c3144c6c706d87298dcf674
darcs-hash:20091020212338-7ad00-6bf1c5c403491f136a1c02af5ecd9f84d7227107.gz
|
|
|
|
| |
darcs-hash:20090119190920-7ad00-5409285ea5c44379fec906d08f5ccb710eac5b6d.gz
|
|
|
|
| |
darcs-hash:20090118200357-7ad00-2d3a8dcb57ef5d19efe65fd4af8c26af261aef06.gz
|
|
|
|
| |
darcs-hash:20081226183403-7ad00-1a4d08ab0f674eb3dcda131dd49ddaeb27129ad6.gz
|
|
|
|
| |
darcs-hash:20081213090400-7ad00-4e21cd75978bb07513f32f5d750658e8d777c59e.gz
|
|
|
|
|
|
|
|
| |
Currently the min. token length is 3 (note, this doesn't apply to numeric tokens).
The value set in inc/indexer.php can be overridden by defining IDX_MINWORDLENGTH
elsewhere (e.g. conf/local.protected.php).
darcs-hash:20081207161129-f07c6-6432947fe5d74666409d1e00222eaa489374c32f.gz
|
|
|
|
|
|
|
|
|
| |
This patch makes the highlighting of phrases in search snippets and on
the pages itself much better.
Now a regexp gets passed to the ?s
darcs-hash:20080215174653-7ad00-cd2d6f7d408db7b7dd3cb9974c3eb27f3a9baeac.gz
|
|
|
|
| |
darcs-hash:20071012000327-6942e-bdef26ce258dea0229ad8b8dbbc7c089dea880ad.gz
|