summaryrefslogtreecommitdiff
path: root/inc/indexer.php
Commit message (Collapse)AuthorAge
* Activate the render parameter of p_get_metadataMichael Hamann2011-01-10
| | | | | | | | | | | | | | | p_get_metadata has a $render parameter that has been disabled by the restructuring of metadata rendering. This change reactivates it so rendering metadata can be prevented. This is e.g. used in the search and in some plugins like indexmenu that use p_get_first_heading. The default of the parameter has been changed to true as otherwise the new caching structure won't work as almost all calls to p_get_metadata don't set the $render parameter. The indexer call to p_get_first_heading has been changed to set $render to true as in the indexer only one page will be rendered and the title in the index should really be the current one. This does not fix the problem that rendering pages with lots of links or displaying the index can cause the parsing/rendering of a lot of pages.
* Remove enc=utf-8 in VIM modeline as it is not allowed in VIM 7.3Michael Hamann2010-11-29
| | | | | | As of VIM 7.3 it is no longer possible to specify the encoding in the modeline. This gives an error message whenever such a file is opened, thus this commit removes the enc setting from the modeline.
* removed deprecated index update functionAndreas Gohr2010-10-18
|
* Perform quick search in title as wellAdrian Lang2010-06-16
|
* Limiting use of readdir in the idx_indexLengths function (v2).YoBoY2010-03-24
| | | | | | Each searches on the wiki use this function. Scanning the index directory eachtime is time consuming with a constant series of disk access. Switching a normal search to use file_exists 1 or more times, and not readdir all the directory. Switching a wildcard search to use a lengths.idx file containing all the word lengths used in the wiki, file generated if a new configuration parameter $conf[readdircache] is not 0 and fixed to a time in second. Creation of a new function idx_listIndexLengths to do this part.
* first attempt to centralize all include loadingAndreas Gohr2010-01-31
| | | | | | | | Classes are loaded throug PHP5's class autoloader, all other includes are just loaded by default. This skips a lot of require_once calls. Parser and Plugin stuff isn't handled by the class loader yet.
* fixed file headerAndreas Gohr2010-01-31
|
* Emit less E_NOTICEs and E_STRICTsAdrian Lang2009-11-04
| | | | | | | | | | | | | Changes of behaviour are: * Allow the user name, title & description \e2\80\9c0\e2\80\9d * Default to Port 443 if using HTTPS * Set $INFO['isadmin'] and $INFO['ismanager'] to \e2\80\9cfalse\e2\80\9d even if no user is logged in * Do not pass empty fragment field in the event data for event ACTION_SHOW_REDIRECT * Handle chunked encoding in HTTPClient darcs-hash:20091104100115-e4919-5cf6397d4a457e3f98a8ca49fbdab03f2147721d.gz
* Coding Standard CleanupAndreas Gohr2009-10-20
| | | | | | Ignore-this: 259cb5773c3144c6c706d87298dcf674 darcs-hash:20091020212338-7ad00-6bf1c5c403491f136a1c02af5ecd9f84d7227107.gz
* Changed minimum word length for fulltext index to 2Andreas Gohr2009-01-19
| | | | darcs-hash:20090119190920-7ad00-5409285ea5c44379fec906d08f5ccb710eac5b6d.gz
* fixed indexer which was broken by miscalculation in previous optimizationAndreas Gohr2009-01-18
| | | | darcs-hash:20090118200357-7ad00-2d3a8dcb57ef5d19efe65fd4af8c26af261aef06.gz
* minor optimizations in the fulltext indexing methodsAndreas Gohr2008-12-26
| | | | darcs-hash:20081226183403-7ad00-1a4d08ab0f674eb3dcda131dd49ddaeb27129ad6.gz
* removed some illogical path setupsAndreas Gohr2008-12-13
| | | | darcs-hash:20081213090400-7ad00-4e21cd75978bb07513f32f5d750658e8d777c59e.gz
* Change search index min. token length to a define (IDX_MINWORDLENGTH)Chris Smith2008-12-07
| | | | | | | | Currently the min. token length is 3 (note, this doesn't apply to numeric tokens). The value set in inc/indexer.php can be overridden by defining IDX_MINWORDLENGTH elsewhere (e.g. conf/local.protected.php). darcs-hash:20081207161129-f07c6-6432947fe5d74666409d1e00222eaa489374c32f.gz
* better highlighting for phrase searches FS#1193Andreas Gohr2008-02-15
| | | | | | | | | This patch makes the highlighting of phrases in search snippets and on the pages itself much better. Now a regexp gets passed to the ?s darcs-hash:20080215174653-7ad00-cd2d6f7d408db7b7dd3cb9974c3eb27f3a9baeac.gz
* Reduce memory requirement for indexerTom N Harris2007-10-12
| | | | darcs-hash:20071012000327-6942e-bdef26ce258dea0229ad8b8dbbc7c089dea880ad.gz
* add page_exists function (inc/pageutils.php)Chris Smith2007-09-30
| | | | | | bool page_exists($id, $rev darcs-hash:20070930021040-d26fc-e3847bfdd20a36154685262eca94211cfd461e83.gz
* Remove extraneous print statementTom N Harris2007-10-01
| | | | darcs-hash:20071001192639-6942e-f7abb7a91f0b3d9c42267df233815debbdd5ad58.gz
* don't use realpath() anymore (FS#1261 and others)Andreas Gohr2007-09-30
| | | | | | | | | | | The use of realpath() to clean up relative file names caused some trouble in certain setups relying on symlinks or having restricitve file structure setups. This patch replaces all realpath() calls with a PHP only replacement which should solve those problems. darcs-hash:20070930184250-7ad00-512ff04c95f57fc9eaf104f80372237a3c94286f.gz
* Remove obsolete words from search indexTom N Harris2007-09-19
| | | | | | | | | Creates another index file 'pagewords.idx' for the words in each page. Words that are deleted from a page can then be removed from the word index. The indexer version is incremented to force rebuilding of the index. Also, a minor flaw in the regexp for asian words is fixed. darcs-hash:20070919194244-6942e-2e08157dcf4fdf166b35b36a0faf8a3dfb415ad9.gz
* spelling fix FS#1220Andreas Gohr2007-08-09
| | | | darcs-hash:20070809212154-7ad00-bde57d95f9b61840f1cdac4d60f89bcd0ae83c4a.gz
* fix asian word search FS#1188Andreas Gohr2007-07-18
| | | | darcs-hash:20070718073121-7ad00-60e45fb3913fa3745511c640a55aa1b7446a3657.gz
* fix pass by reference problem in indexer.phpAndreas Gohr2007-03-01
| | | | darcs-hash:20070301211751-7ad00-d4212a363176501a31a0971a00f81e18ee00fab3.gz
* INDEXER_PAGE_ADD eventEsther Brunner2007-02-27
| | | | darcs-hash:20070227124424-20862-78b4e1863830e88aa9564e6b9c58fa0cdf03d41c.gz
* soted indexer is now defaultAndreas Gohr2007-02-26
| | | | darcs-hash:20070226175529-7ad00-4d3d984da1edbf2ded546cfbd7374f97f032d032.gz
* Indexer asian language fixes and speed-upsTom N Harris2006-11-17
| | | | | | | | | Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing faster. The indexes will expire on backend upgrades, so you don't have to delete *.indexed darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz
* bracket fix in inc/indexer.phpAndreas Gohr2006-11-14
| | | | darcs-hash:20061114210440-7ad00-841acaf84e77e7bea16b96317531bd502ee44938.gz
* fixes for stricter php5 typing (bug#978)chris2006-11-13
| | | | darcs-hash:20061113122645-9b6ab-e5f5be2e88eea7eb00643e6a5210086f46191c30.gz
* Word-Length IndexerTNHarris2006-11-12
| | | | | | | | | | | | | | | | | A modification to the indexer that sorts words based on length. This should make searching a little bit more efficient. After the patch is applied, your old index will be automatically converted to the new format (when you visit a page). The new index format is: 1. Index files are stored in savedir/index 2. Word lists are stored as wlen.idx. This used to be word.idx. 3. Word indexes are stored as ilen.idx. This used to be index.idx. 4. The page list, page.idx, is simply copied to the new location. Any plugins you have, such as the blog plugin, that read the index files need to be updated. darcs-hash:20061112194900-2b9f0-a975498ccf0a1d39c6df73b79bcd028d5e81c389.gz
* backlinks fixes (bugs #795 & #937)chris2006-11-05
| | | | | | | - add deaccented and romanised page names to index word list - remove stop words from tokens used in backlink search darcs-hash:20061105195453-9b6ab-6c4989eb75782af60a3de3bddbc99a83de2b4c80.gz
* search improvementschris2006-08-31
| | | | | | | | | | | | | | | | | | | ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts - minor improvements idx_lookup() - minor changes to wildcard matching code to improve performance (changes based on profiling results) utf8 - specifically set mb_internal_coding to utf-8 when mb_string functions will be used. darcs-hash:20060831003413-9b6ab-712021eda3c959ffe79d8d3fe91d2c9a8acf2b58.gz
* update wikiFN with third paramter, $cleanchris2006-08-25
| | | | | | | | | | value defaults to true patch also includes an update to idx_parseIndexLine to make use of the new parameter - the index file (if built by DokuWiki's methods) will contain already "clean" IDs. darcs-hash:20060825144112-9b6ab-55adc71cf55bb58468fb3f0b03b9001ab149a82b.gz
* fixed stupid bug in search query parserAndreas Gohr2006-06-18
| | | | darcs-hash:20060618134515-7ad00-3097e310ccdaf793b5da3bd49a54723fea7ec260.gz
* changed all occurances of rename() to io_rename()Andreas Gohr2006-05-07
| | | | darcs-hash:20060507101333-7ad00-e687d797fbee26e0b0bc7741ff8a1af496c101bf.gz
* file cleanupsAndreas Gohr2006-02-17
| | | | | | | | | | This patch cleans up the source code to satisfy the coding guidelines (see http://wiki.splitbrain.org/wiki:development#coding_style) It converts files to UNIX lineendings and removes tabs and trailing whitespace. Not all files were cleaned yet. darcs-hash:20060217222040-7ad00-bba3d2bee3b5aa7cbb5184258abd50805cd071bf.gz
* fixed indexer word counts for UTF-8 words #653Osamu Higuchi2006-01-27
| | | | darcs-hash:20060126233702-87e23-9382dd77b66f263fa51ad02dc31264c667fdbc70.gz
* Wildcardsearch added #552 #632Andreas Gohr2005-11-27
| | | | | | | | | | | Now searching for word parts is possible by adding or prepending a * character to the searchword: 'foo*' searches for words beginning with 'foo' eg. 'foobar' '*foo' looks for words ending in 'foo' eg. 'barfoo' '*foo*' gets anything with 'foo' in it eg. 'barfoobaz' darcs-hash:20051127180723-7ad00-1eb29e812ddaf38d9812697bb1cffffe9a5fb330.gz
* ignore regexp failures when handling asian charsAndreas Gohr2005-10-09
| | | | | | | | | The new handling of asian chars as single words needs a recent PCRE library (PHP 4.3.10 is known work). If this support isn't available the regexp compilation will fail. This patch adds a workaround - this means the search will not work as expected with asian words on older PHP versions. darcs-hash:20051009124833-7ad00-1319829be5cb73246e13eb65e4c950d43c6ce5bf.gz
* asian language support for the indexer #563Andreas Gohr2005-09-25
| | | | | | | | | | | | | | Asian languages do not use spaces to seperate words. The indexer however does a word based lookup. Splitting for example Japanese texts into real words is only possible with complicated natural language processing, something completely out of scope for DokuWiki. This patch solves the problem by treating all asian characters as single words. When an asian word (consisting of multiple characters) is searched it is treated as a phrase search, looking up each charcter by it self first, then checking for the phrase in found documents. darcs-hash:20050925175451-7ad00-933b33b51b5f2fa05e736c18b8db58a5fdbf41ce.gz
* backlinkfix for pages with special characters #548Andreas Gohr2005-09-21
| | | | darcs-hash:20050921195118-7ad00-9070166cbaa26e3f27f7b92382346a70f5c479a1.gz
* more efficient changelog reading for recent changesAndreas Gohr2005-09-18
| | | | | | | | | | | getRecents now reads the changelog backwards in 4KB chunks instead of loading the whole file into an array and rsort it. This should be more memory efficient (and probably faster) for large change logs. Note: the format of the array returned by getRecents changed slightly plugins relying on it need to be adjusted. Sorry. darcs-hash:20050918121008-7ad00-1fdba47d29b0c038c6e4e4edc1d4c93e5ba769e9.gz
* fixed stupid bug in indexerAndreas Gohr2005-09-12
| | | | | | | There was a stupid bug in the indexer which prevented the adding of new words (only non ASCII words were added) darcs-hash:20050912145813-7ad00-4351dbb1ab984d97322953c0ba4c9962ad887697.gz
* added missing ACL checks for new index based searchesAndreas Gohr2005-09-12
| | | | darcs-hash:20050912143027-7ad00-b2f3165d8db7122a453ecc63ad031af4467f691f.gz
* try faster rename before falling back to copy in indexerAndreas Gohr2005-09-07
| | | | darcs-hash:20050907210643-7ad00-a5cd36dc8b48ca445af87e9f066c7a54a98a3658.gz
* indexer rename bugfix for Win32Dave Doyle2005-09-06
| | | | darcs-hash:20050906214043-a62d3-65097acf0b035fd6fe9833136a15f9562e69970f.gz
* new fulltext search function using the indexAndreas Gohr2005-08-28
| | | | | | | The new search function was added but is not yet integrated into DokuWikis interface. darcs-hash:20050828152821-7ad00-a6e79a9dc5aaf41c547cf42dccdbc3b5bc8d303e.gz
* index lookup function addedAndreas Gohr2005-08-27
| | | | darcs-hash:20050827174813-7ad00-fe84d120801b63aaaf9f8482a66d1ed1181851bd.gz
* indexer improvements & fix for underscoreschris2005-08-16
| | | | darcs-hash:20050816032408-50fdc-6e41585c9b97d70a218877b8ad169df9117d9965.gz
* much faster implementation of idx_getPageWords()Chris Smith2005-08-15
| | | | darcs-hash:20050815184030-d26fc-bb7d0a36885ddcaa3c680501c54dd7979056f73e.gz
* added stopword support to the indexer, added indexer webbugAndreas Gohr2005-08-14
| | | | darcs-hash:20050814181035-7ad00-ed5d879d29fcee7f925f806456675605b058966a.gz