summaryrefslogtreecommitdiff
path: root/inc/fulltext.php
Commit message (Collapse)AuthorAge
...
* Fix backlinks - See FS#1040Guy Brand2007-03-30
| | | | darcs-hash:20070330215042-19e2d-3528f2412ff044eb45158f349db5bbb5e32d907b.gz
* fixed warning whith no search results FS#1088Andreas Gohr2007-03-03
| | | | darcs-hash:20070303220143-7ad00-5d592dbebaae371c03102b20ae7e0d9e433b378b.gz
* fix for slashes in phrase search #1066Andreas Gohr2007-02-05
| | | | darcs-hash:20070205191848-7ad00-77ad5a398534a7a64884e155c4607350e0f25a7c.gz
* trim pagename returned by ft_pageLookupAndreas Gohr2006-11-24
| | | | darcs-hash:20061124215413-7ad00-f2bd46b7edf70660cc3e0274bd222eafba1edbc6.gz
* Word-Length IndexerTNHarris2006-11-12
| | | | | | | | | | | | | | | | | A modification to the indexer that sorts words based on length. This should make searching a little bit more efficient. After the patch is applied, your old index will be automatically converted to the new format (when you visit a page). The new index format is: 1. Index files are stored in savedir/index 2. Word lists are stored as wlen.idx. This used to be word.idx. 3. Word indexes are stored as ilen.idx. This used to be index.idx. 4. The page list, page.idx, is simply copied to the new location. Any plugins you have, such as the blog plugin, that read the index files need to be updated. darcs-hash:20061112194900-2b9f0-a975498ccf0a1d39c6df73b79bcd028d5e81c389.gz
* backlinks fixes (bugs #795 & #937)chris2006-11-05
| | | | | | | - add deaccented and romanised page names to index word list - remove stop words from tokens used in backlink search darcs-hash:20061105195453-9b6ab-6c4989eb75782af60a3de3bddbc99a83de2b4c80.gz
* remove unused codeAndreas Gohr2006-10-08
| | | | | | | This patch removes some commented code fragments and alternative snippet generators darcs-hash:20061008090624-7ad00-14bfee2ded6c6c8ef43ad02a4c02a5d95ee9daf7.gz
* more utf8_substr improvements (re FS#891 and yesterday's patch)chris2006-09-28
| | | | | | | - rework utf8_substr() NOMBSTRING code to always use pcre - remove work around for utf8_substr() and large strings from ft_snippet() darcs-hash:20060928165122-9b6ab-0eefc216f07f9d7e7d8eb62ce26605c28ee340fa.gz
* parser caching updatechris2006-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch primarily updates p_cached_xhtml() and p_cached_instructions() to allow their caching logic to be surrounded by an event trigger. p_cached_xhtml() has been rewritten as the more general p_cached_output() to support other render output formats besides 'xhtml'. All calls to p_cached_xhtml() have been changed to refer to the new function. New event: name: PARSER_CACHE_USE data: cache object (see below) action: determine if cache file can be used preventable: yes result: bool, true to use cache file, false otherwise Cache operations have been generalised in a new class, cache, extended to cache_parser, cache_renderer & cache_instructions. Details can be found in inc/cache.php For handling of above event, key properties are: - page, if present the wiki page id, may not always be present, e.g. when called for locale xhtml files - file, source file - mode, renderer mode (e.g. 'xhtml') or 'i' for instructions Other changes: - cache class counts cache hits against attempts, results are stored in {cache_dir}/cache_stats.txt - adds metadata dependency to renderer page cache - replaces purgefile dependency for renderer cache with metadata 'relation references' (internal link) dependency for wiki pages only darcs-hash:20060911021418-9b6ab-19601ed194b8c8e45236ab72c3e23d78bf777e6c.gz
* update backlink search to use metadatachris2006-09-01
| | | | darcs-hash:20060901002016-9b6ab-716518138edf541a869510d7c2934b9474547fc3.gz
* add unittests for bug#891chris2006-08-31
| | | | darcs-hash:20060831092146-9b6ab-b00aa29c982ab18117f476b3d01d5111915c9d4b.gz
* search improvementschris2006-08-31
| | | | | | | | | | | | | | | | | | | ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts - minor improvements idx_lookup() - minor changes to wildcard matching code to improve performance (changes based on profiling results) utf8 - specifically set mb_internal_coding to utf-8 when mb_string functions will be used. darcs-hash:20060831003413-9b6ab-712021eda3c959ffe79d8d3fe91d2c9a8acf2b58.gz
* ft_snippet() updatechris2006-08-27
| | | | | | | | | | | | - correct "opt1" algorithm for multibyte utf8 - minor improvement to "opt2" for short pages - add "utf8" algorithm, this algorithm endeavours to work with whole utf8 character as much as possible. The resulting snippet will tend to 100 characters, rather than the 100 bytes of "opt1" and "opt2". darcs-hash:20060826234333-9b6ab-ae4c60c8855a92b133cb8d5a230098203f610e7b.gz
* ft_snippet() update, fix utf8 problemschris2006-08-26
| | | | darcs-hash:20060826095311-9b6ab-9a6f272cc7c7532eb2bad8f7b4404c5a16b71109.gz
* code to remove bad UTF-8 bytes addedAndreas Gohr2006-08-26
| | | | | | | This adds code to remove or replace invalid UTF-8 bytes and uses it in the ft_snippets function. darcs-hash:20060826082919-7ad00-a94004de159ae93ff5b7270fd3e631ff467233cd.gz
* update to previous ft_snippet() patch, improve snippet text selectionchris2006-08-25
| | | | darcs-hash:20060825134730-9b6ab-086ee0647af39c4398cf1726324d8215722a39db.gz
* ft_snippet optimisationschris2006-08-25
| | | | | | | | | | | This patch includes two alternative algorithms for ft_snippet(), the code which prepares the snippets seen on the search page - and the most time consuming part of the production of that page. If you have $conf['allowdebug'] on, you can specify the search algorithm to use by adding &_search darcs-hash:20060825104046-9b6ab-942d81a43cf0f85bfd235cabf6c35dd4b20e0b71.gz
* namespace-restricted fulltext-search part2Michael Klier chi@chimeric.de2006-05-18
| | | | | | | | | | | | | - now its possible to restrict the fulltext-search to multible namespaces Examples: searchword @ns1 @ns2 @ns3 "exact phrase" @ns1 @ns2 @ns3 darcs-hash:20060518204647-484ab-061521a81f13360e33496e5163e3cd263a9c1ad6.gz
* namespace restricted fulltext-searchMichael Klier chi@chimeric.de2006-05-18
| | | | | | | - The fulltext-search can now be restricted to a given namespace seperated by an "@" darcs-hash:20060518161855-484ab-1617b6d2c3593525f4d29a789b0a32ebf414b9ae.gz
* file cleanupsAndreas Gohr2006-02-17
| | | | | | | | | | This patch cleans up the source code to satisfy the coding guidelines (see http://wiki.splitbrain.org/wiki:development#coding_style) It converts files to UNIX lineendings and removes tabs and trailing whitespace. Not all files were cleaned yet. darcs-hash:20060217222040-7ad00-bba3d2bee3b5aa7cbb5184258abd50805cd071bf.gz
* Wildcardsearch added #552 #632Andreas Gohr2005-11-27
| | | | | | | | | | | Now searching for word parts is possible by adding or prepending a * character to the searchword: 'foo*' searches for words beginning with 'foo' eg. 'foobar' '*foo' looks for words ending in 'foo' eg. 'barfoo' '*foo*' gets anything with 'foo' in it eg. 'barfoobaz' darcs-hash:20051127180723-7ad00-1eb29e812ddaf38d9812697bb1cffffe9a5fb330.gz
* hidepages configoptionAndreas Gohr2005-11-03
| | | | | | | | | | | | | This new option accepts a RegExp to filter certain pages from all automatic listings (RSS, recent changes, search results, index). This is useful to exclude certain pages like the ones used in the sitebar templates. The regexp is matched against the full page ID with a leading colon. If it matches the page is assumed to be a hidden one. IMPORTANT: this is not related to ACL. A hidden page is still visible to all users (if not restricted by ACL) when linked or called directly. darcs-hash:20051103101726-6e07b-8d45912a1b4f6cfc9e3fce147c15f84a58ea7ca2.gz
* ignore regexp failures when handling asian charsAndreas Gohr2005-10-09
| | | | | | | | | The new handling of asian chars as single words needs a recent PCRE library (PHP 4.3.10 is known work). If this support isn't available the regexp compilation will fail. This patch adds a workaround - this means the search will not work as expected with asian words on older PHP versions. darcs-hash:20051009124833-7ad00-1319829be5cb73246e13eb65e4c950d43c6ce5bf.gz
* asian language support for the indexer #563Andreas Gohr2005-09-25
| | | | | | | | | | | | | | Asian languages do not use spaces to seperate words. The indexer however does a word based lookup. Splitting for example Japanese texts into real words is only possible with complicated natural language processing, something completely out of scope for DokuWiki. This patch solves the problem by treating all asian characters as single words. When an asian word (consisting of multiple characters) is searched it is treated as a phrase search, looking up each charcter by it self first, then checking for the phrase in found documents. darcs-hash:20050925175451-7ad00-933b33b51b5f2fa05e736c18b8db58a5fdbf41ce.gz
* fix for backlinksAndreas Gohr2005-09-25
| | | | darcs-hash:20050925102211-7ad00-200edd676ba3956f03ec5bcc5149d4aa4bd15e24.gz
* backlinkfix for pages with special characters #548Andreas Gohr2005-09-21
| | | | darcs-hash:20050921195118-7ad00-9070166cbaa26e3f27f7b92382346a70f5c479a1.gz
* added missing ACL checks for new index based searchesAndreas Gohr2005-09-12
| | | | darcs-hash:20050912143027-7ad00-b2f3165d8db7122a453ecc63ad031af4467f691f.gz
* backlinks now use the new index based searchAndreas Gohr2005-09-12
| | | | darcs-hash:20050912141042-7ad00-5ef43525c9fd7ba44206720c54bb566450f93250.gz
* the search now uses the indexAndreas Gohr2005-09-04
| | | | darcs-hash:20050903220229-7ad00-5d95f905eaeb3f6b867aa3ee43c2a8bccc533c00.gz
* new fulltext search function using the indexAndreas Gohr2005-08-28
The new search function was added but is not yet integrated into DokuWikis interface. darcs-hash:20050828152821-7ad00-a6e79a9dc5aaf41c547cf42dccdbc3b5bc8d303e.gz