summaryrefslogtreecommitdiff
path: root/inc/indexer.php
Commit message (Collapse)AuthorAge
* Force search index update after fixing the lowercasing of wordsMichael Hamann2011-06-14
| | | | | | This increases the indexer version in order to force a rebuild of the search index in order to "repair" the search index that might contain uppercase words
* Fix lowercasing of words in the indexer FS#2270Michael Hamann2011-06-14
| | | | | | On certain PHP installations (it has been reproduced with PHP version 5.2.0-8+etch11) the indexer failed to lowercase words in the indexer so the fulltext search was partially broken.
* Fix variable name typo in indexerAdrian Lang2011-05-23
|
* Add more render/cache logic to the metadata codeMichael Hamann2011-05-08
| | | | | | | | | | | | | | | | | | | | This adds a new rendering limit of currently 5 pages to the p_get_metadata function. This means that in one request not more than 3 pages will be parsed/rendered. Pages for which the cache can be used aren't counted. This should make the new cache modes safe to use and should provide backwards compatibility while keeping the advantage of rendering metadata on demand (i.e. imagine one included page out of 10 is updated, then the metadata for that page can be rendered, but when you request a purge of the cache not 10 pages are rendered). In this commit most of the changes to the p_get_first_heading function are reverted and the title index is no longer used. This makes the first heading functionality no longer depends on the search index of DokuWiki. Maybe it can be added again later when the indexer provides a proper API for getting metadata values for all or selected pages. The performance of the p_get_first_heading function should be almost back to the performance in Anteater as the simple cache of p_get_metadata is used and also the limit of p_get_metadata is of course applied.
* Add a test to do=check that should detect search index corruptionMichael Hamann2011-05-02
| | | | | | With this test it should be possible to detect if the search index has been corrupted by using Rincewind RC or a git version of the weeks before the RC release.
* Add a force option to idx_addPage()Michael Hamann2011-05-02
|
* Add line endings at the end of the fileMichael Hamann2011-05-02
| | | | | | | | | | | | The bug that is fixed here may have corrupted your search index in a way that it produces wrong or missing results and won't be fixed automatically. This occurs when you have deleted the last occurrence of a word that has been on the last line of one of the word indexes. A functionality for checking for a broken search index will be added. The index can be fixed by deleting it completely (remove all .idx files in data/index/) and recreating it using bin/indexer.php -c. The searchindex plugin will be updated to be able to do the same, soon.
* Fix a warning in the indexer when words exist without corresponding index ↵Michael Hamann2011-05-02
| | | | FS#2242
* Enable metadata rendering in the indexerMichael Hamann2011-04-20
| | | | Metadata is rendered now in the indexer when it's cache is invalid.
* Merge branch 'master' of https://github.com/akate/dokuwiki into akate-masterMichael Hamann2011-04-07
|\
| * indexer fix updating the search indexKate Arzamastseva2011-04-07
| |
* | Clarify usage of some indexer methodsTom N Harris2011-03-22
| |
* | Change Doku_Indexer visibility from private to protected, and get rid of ↵Tom N Harris2011-03-22
|/ | | | ugly underscores
* replace tokenizer_cmd with action hookAndreas Gohr2011-03-19
| | | | | as discussed at http://www.freelists.org/post/dokuwiki/tokenizer-cmd-in-indexer,1
* Remove relation_references from the index when it is missingMichael Hamann2011-03-08
|
* Merge the two indexer events and use string keysMichael Hamann2011-03-06
| | | | | | | This merges the INDEXER_PAGE_ADD and INDEXER_METADATA_INDEX events and introduces the new string keys 'page', 'body' and 'metadata' in the event data. All plugins that use INDEXER_PAGE_ADD need to be adjusted to use the key 'page' instead of 0 and 'body' instead of 1.
* Fix wildcard searchTom N Harris2011-02-27
|
* Restrict metadata values in indexer to string; skip unnecessary testTom N Harris2011-02-25
|
* Reduce memory footprint of tokenizer; make returned arrays use contiguous keysTom N Harris2011-02-25
|
* Fix pass by reference error, always return an array in lookupKey()Michael Hamann2011-02-24
|
* Merge branch 'master' into indexer_rewriteMichael Hamann2011-02-24
|\ | | | | | | | | | | | | Conflicts: inc/fulltext.php inc/indexer.php lib/exe/indexer.php
| * ignore soft-hyphens for search FS#2049Andreas Gohr2011-02-06
| | | | | | | | | | This makes it possible to find words that include soft-hyphens. However, search higlighting will not work and I have no idea how to make it work.
| * Add CJK characters to IDX_ASIAN2 - FS#2143Danny Lin2011-01-23
| |
| * Activate the render parameter of p_get_metadataMichael Hamann2011-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | p_get_metadata has a $render parameter that has been disabled by the restructuring of metadata rendering. This change reactivates it so rendering metadata can be prevented. This is e.g. used in the search and in some plugins like indexmenu that use p_get_first_heading. The default of the parameter has been changed to true as otherwise the new caching structure won't work as almost all calls to p_get_metadata don't set the $render parameter. The indexer call to p_get_first_heading has been changed to set $render to true as in the indexer only one page will be rendered and the title in the index should really be the current one. This does not fix the problem that rendering pages with lots of links or displaying the index can cause the parsing/rendering of a lot of pages.
| * Remove enc=utf-8 in VIM modeline as it is not allowed in VIM 7.3Michael Hamann2010-11-29
| | | | | | | | | | | | As of VIM 7.3 it is no longer possible to specify the encoding in the modeline. This gives an error message whenever such a file is opened, thus this commit removes the enc setting from the modeline.
* | Add minimum length option to index histogramTom N Harris2011-02-23
| |
* | Increase version tag for new indexerTom N Harris2011-02-23
| |
* | Fix variable name type in indexerTom N Harris2011-02-22
| |
* | Implement histogram method of indexerTom N Harris2011-02-22
| |
* | Indexer version tag should include plugin namesTom N Harris2011-02-22
| |
* | Removing a page from the index deletes related metadata. Cache key names in ↵Tom N Harris2011-02-22
| | | | | | | | index.
* | Indexer::lookupKey callback receives value reference as first argTom N Harris2011-02-22
| |
* | Special handling of title metadata indexTom N Harris2011-02-18
| |
* | Merge remote-tracking branch 'my-fork/master' into indexer_improvementsMichael Hamann2011-02-02
|\ \
| * | Indexer Rewrite v3: wildcards in lookupKey and automatically unwrap single ↵Tom N Harris2011-01-24
| | | | | | | | | | | | result
| * | Indexer v3 Rewrite: streamline indexing of deleted or disabled pagesTom N Harris2011-01-24
| | |
* | | Add INDEXER_VERSION_GET event so plugins can add their versionMichael Hamann2011-01-23
|/ / | | | | | | | | | | This allows plugins to add their own version strings like plugin_tag=1 so pages can be reindexed when plugins update their index content.
* | Indexer v3 Rewrite: Use the metadata index for backlinks; add ↵Michael Hamann2011-01-23
| | | | | | | | | | | | | | | | | | | | | | INDEXER_METADATA_INDEX event This new event allows plugins to add or modify the metadata that will be indexed. Collecting this metadata in an event allows plugins to see if other plugins have already added the metadata they need and leads to just one single indexer call thus fewer files are read and written. Plugins could also replace/prevent the metadata indexer call using this event.
* | Indexer v3 Rewrite: fix addMetaKeys and lockingMichael Hamann2011-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes addMetaKeys so it actually removes values. This also changes the functionality of the function: It now updates the key for the page with the current value instead of adding new values as this will be the default use case. A new parameter could be added to restore the "old" behavior when needed. addMetaKeys now only saves the index when the content has really been changed. Furthermore no empty number is added anymore to the reverse index when it has been empty previously. addMetaKeys now releases the lock again and really fails when the lock can't be gained.
* | Indexer v3 Rewrite: implement lookupKey()Michael Hamann2011-01-22
| | | | | | | | | | Saving and looking up metadata key/value pairs seems to work now at least with some basic tests.
* | Indexer v3 Rewrite: _saveIndexKey now really writes on the desired lineMichael Hamann2011-01-22
| | | | | | | | | | | | | | Now _saveIndexKey inserts empty lines when the index isn't long enough. This is necessary because the page ids are taken from the global page index, but there is not every page in the metadata key specific index so e.g. line 10 might be the first entry in the index.
* | Indexer v3 Rewrite: fix obvious typos and type errorsMichael Hamann2011-01-22
| |
* | Indexer v3 Rewrite part two, update uses of indexerTom N Harris2010-12-29
| |
* | Indexer v3 Rewrite part one (unstable)Tom N Harris2010-12-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The indexer functions have been converted to a class interface. Use the Doku_Indexer class to access the indexer with these public methods: addPageWords addMetaKeys deletePage tokenizer lookup lookupKey getPages histogram These functions are provided for general use: idx_get_version idx_get_indexer idx_get_stopwords idx_addPage idx_lookup idx_tokenizer These functions are still available, but are deprecated: idx_getIndex idx_indexLengths All other old idx_ functions are unsupported and have been removed.
* | Merge branch 'tokenizer-rewrite' into michituxTom N Harris2010-11-20
|\ \
| * | Restore io_runcmd, use io_exec for exec with pipesTom N Harris2010-11-18
| | |
| * | Use a different indexer version when external tokenizer is enabledTom N Harris2010-11-17
| | |
| * | Use external program to split pages into wordsTom N Harris2010-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | An external tokenizer inserts extra spaces to mark words in the input text. The text is sent through STDIN and STDOUT file handles. A good choice for Chinese and Japanese is MeCab. http://sourceforge.net/projects/mecab/ With the command line 'mecab -O wakati'
| * | tokenizer was returning prematurelyTom N Harris2010-11-15
| | |
| * | Refactor tokenizer to avoid splitting multiple timesTom N Harris2010-11-14
| | |