summaryrefslogtreecommitdiff
path: root/lib/exe/indexer.php
Commit message (Collapse)AuthorAge
* Merge branch 'master' into indexer_improvementsMichael Hamann2011-01-23
|\ | | | | | | | | | | | | Conflicts: inc/fulltext.php inc/indexer.php lib/exe/indexer.php
| * increase indexer version to reforce rebuild for the new title indexAndreas Gohr2011-01-16
| |
| * Remove enc=utf-8 in VIM modeline as it is not allowed in VIM 7.3Michael Hamann2010-11-29
| | | | | | | | | | | | As of VIM 7.3 it is no longer possible to specify the encoding in the modeline. This gives an error message whenever such a file is opened, thus this commit removes the enc setting from the modeline.
| * Render metadata when neededMichael Hamann2010-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes fundamentally when metadata is rendered. This commit introduces a new cache file for every page that just contains a timestamp and is updated whenever the metadata of that page is rendered. Metadata is rendered when p_get_metadata is called and the last rendering has been before a page, metadata, configuration or renderer update or purge is set like in the xhtml renderer cache. Metadata is no longer automatically rendered when the xhtml renderer cache isn't used but will still be rendered when needed as p_get_metadata is called in the cache. Metadata is also no longer rendered in the indexer script when missing as that is already done by pageinfo() before anything else is done so the indexer script won't be called when there is no metadata file.
* | Indexer v3 Rewrite part two, update uses of indexerTom N Harris2010-12-29
| |
* | Use a different indexer version when external tokenizer is enabledTom N Harris2010-11-17
| |
* | Do not assume that index files will be backward compatibleTom N Harris2010-11-14
|/
* Introduce metadata write wrapper p_save_metadataAdrian Lang2010-11-13
| | | | p_purge_metadata now updates the metadata cache and the INFO array like the other metadata writing functions
* Merge branch 'master', remote branch 'sitemap/master'Michael Hamann2010-11-02
|\
| * Transformed the sitemapper into a classMichael Hamann2010-09-22
| | | | | | | | This makes it possible to autoload the sitemapper when needed.
| * Sitemap rewriteMichael Hamann2010-09-22
| |
* | Honor allowdebug setting in lib/exe/indexerAdrian Lang2010-10-25
| |
* | removed deprecated index update functionAndreas Gohr2010-10-18
|/
* Ignore small & own changes in digest & list mailsAdrian Lang2010-08-10
|
* Add title index to the indexer files, improve indexer callsAdrian Lang2010-06-16
|
* Remove temp indexer upgrade stuff from 579b0f7eAdrian Lang2010-06-16
|
* Add locking for indexer-based notificationsAdrian Lang2010-05-05
|
* Merge branch 'requireall'Andreas Gohr2010-03-12
|\ | | | | | | | | Conflicts: inc/fulltext.php
| * removed require's in lib/exe/*Andreas Gohr2010-02-01
| |
* | Correct subscribe config parameter nameAdrian Lang2010-02-08
| |
* | Fix documentation for subscribe_timeAdrian Lang2010-02-08
|/
* Fix $info var reference in digest sendAdrian Lang2010-01-20
| | | | darcs-hash:20091130135040-e4919-40b6614fe28ea07dc5796661ddda6d005264ddbc.gz
* wrong function name fixedAndreas Gohr2010-01-20
| | | | | | Ignore-this: b74163181c2e41d3be022a6185f3e1c1 darcs-hash:20091124115805-6e07b-e808cf44a00a65ff8c70cc7e8de4dfedadf96cbd.gz
* correctly handle permissions in digest mailerAndreas Gohr2010-01-20
| | | | | | Ignore-this: c34455078907459a846cf7f00e2b586b darcs-hash:20091123161603-6e07b-927477d6ca50e665228487eb0d3ce9787dbe455b.gz
* Add events to subscription.Adrian Lang2010-01-20
|
* New mail subscription with digestAdrian Lang2010-01-20
|
* Emit less E_NOTICEs and E_STRICTsAdrian Lang2009-11-04
| | | | | | | | | | | | | Changes of behaviour are: * Allow the user name, title & description \e2\80\9c0\e2\80\9d * Default to Port 443 if using HTTPS * Set $INFO['isadmin'] and $INFO['ismanager'] to \e2\80\9cfalse\e2\80\9d even if no user is logged in * Do not pass empty fragment field in the event data for event ACTION_SHOW_REDIRECT * Handle chunked encoding in HTTPClient darcs-hash:20091104100115-e4919-5cf6397d4a457e3f98a8ca49fbdab03f2147721d.gz
* Updated Microsoft sitemap ping URL for bingAndreas Gohr2009-10-14
| | | | | | Ignore-this: bab31d8f21840cf36b3e6fbe9c0b1b63 darcs-hash:20091014112449-6e07b-c298b41cfee8940c01f515b399fcf1a2da0fb237.gz
* Prevent unnecessary updates of the changelog (FS#1758)Mykola Ostrovskyy2009-09-20
| | | | | | Ignore-this: 5653cc47ce2ee6412ba82c82eb2b45fe darcs-hash:20090920171954-40dc4-0c4249b424314a930cdcbe710796db2820330aef.gz
* removed importoldchangelog and importoldindex pluginsAndreas Gohr2009-01-25
| | | | | | | | | | Ignore-this: fb48b24cecb52541a728ba9c17597d8f These one-shot plugins where used for upgrading older DokuWiki versions and are no longer needed. If you upgrade from a really old version you might want upgrade to intermediate versions instead. darcs-hash:20090125143050-7ad00-5ff7b2cd5f61c392e9e02e13eab947d045d60b04.gz
* Media changelog addedmichael2009-01-18
| | | | | | | | There is a new media changelog now, with the flag RECENTS_MEDIA_CHANGES media changes can be requested from the getRecents()-function or the new getRecentsSince()-function, that returns all changes since a given timestamp and optionally before a given timestamp. The media upload and the XML-RPC-server have been changed to use these functions. Additionally, the event MEDIA_UPLOAD_FINISH has been extended, it has a new $data-attribute (the 5th), that contains a boolean if the file does already exist and will be overwritten. darcs-hash:20090118154345-074e0-5d9a90d269e86d8c6a156ecce5cf63115c827433.gz
* fixed the sitemap submission URL for MS Live SearchAndreas Gohr2008-06-23
| | | | darcs-hash:20080623175256-7ad00-4e6ec21196db228d47dbfede6294613567dbb762.gz
* INDEXER_TASKS_RUN event for index-time hooksTom N Harris2008-02-26
| | | | | | | | The event INDEXER_TASKS_RUN is fired by lib/exe/indexer.php when a page is viewed. Plugins should only hook BEFORE the event if it is important for the task to be run as often as possible. Otherwise, hook AFTER the even to be run only when other tasks have completed. Plugin authors must call stopPropagation() and preventDefault() if any work is done. If your plugin does nothing, then you must allow the event to continue. Not following these rules may cause DokuWiki to exceed the PHP execution time limit. darcs-hash:20080226011940-6942e-09291b73bab84a2c4445b1d1c4de8b3bba743243.gz
* Fix border condition on recent change updateTom N Harris2007-10-16
| | | | darcs-hash:20071015225711-6942e-4d540e23e3c2ab62e378b0b9bc3cb80041c79350.gz
* don't use fullpath() before initializedAndreas Gohr2007-09-30
| | | | darcs-hash:20070930201133-7ad00-a35a6c40f880116009efd9e50cb002bd75733369.gz
* don't use realpath() anymore (FS#1261 and others)Andreas Gohr2007-09-30
| | | | | | | | | | | The use of realpath() to clean up relative file names caused some trouble in certain setups relying on symlinks or having restricitve file structure setups. This patch replaces all realpath() calls with a PHP only replacement which should solve those problems. darcs-hash:20070930184250-7ad00-512ff04c95f57fc9eaf104f80372237a3c94286f.gz
* Remove obsolete words from search indexTom N Harris2007-09-19
| | | | | | | | | Creates another index file 'pagewords.idx' for the words in each page. Words that are deleted from a page can then be removed from the word index. The indexer version is incremented to force rebuilding of the index. Also, a minor flaw in the regexp for asian words is fixed. darcs-hash:20070919194244-6942e-2e08157dcf4fdf166b35b36a0faf8a3dfb415ad9.gz
* improved writability check for sitemap FS#1093Andreas Gohr2007-03-03
| | | | darcs-hash:20070303192836-7ad00-fe821c42ba7541f58ab52b9d8d11b3241bc90b65.gz
* workaround config for FS#852Andreas Gohr2007-02-08
| | | | | | | | | On certain platforms the ignore_user_abort function does not work as expected, resulting in a non working indexer webbug. Users with such a broken system (IIS+PHP as CGI) can enable this option to work around the problem (resulting in longer load times for the webbug). darcs-hash:20070208195145-7ad00-8fc14f9da535a70fa837066773e15a3926b077c7.gz
* string for constant fixBen Coburn2006-12-07
| | | | darcs-hash:20061207075815-05dcb-81fad7f4e40142e01f9f1aaa56f47fa51f978186.gz
* Indexer asian language fixes and speed-upsTom N Harris2006-11-17
| | | | | | | | | Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing faster. The indexes will expire on backend upgrades, so you don't have to delete *.indexed darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz
* sitemapper updateAndreas Gohr2006-11-17
| | | | | | | | | | | | | The Google sitemap protocol was recently adopted by Yahoo and Microsoft and made a common standard. This patch changes the XML namespace URL to the new sitmaps.org site and ups the version to 0.9 Pinging of Yahoo and Microsoft was added to the existing Google ping. The microsoft ping currently fails with a "Bad format" error for unknown reason. This will hopefully change when either Microsoft fixes their URL or releases some documentation. darcs-hash:20061117150030-7ad00-0fac1cba07926c3ffe687a8cbaf465e8de3abcd7.gz
* fixes for stricter php5 typing (bug#978)chris2006-11-13
| | | | darcs-hash:20061113122645-9b6ab-e5f5be2e88eea7eb00643e6a5210086f46191c30.gz
* Word-Length IndexerTNHarris2006-11-12
| | | | | | | | | | | | | | | | | A modification to the indexer that sorts words based on length. This should make searching a little bit more efficient. After the patch is applied, your old index will be automatically converted to the new format (when you visit a page). The new index format is: 1. Index files are stored in savedir/index 2. Word lists are stored as wlen.idx. This used to be word.idx. 3. Word indexes are stored as ilen.idx. This used to be index.idx. 4. The page list, page.idx, is simply copied to the new location. Any plugins you have, such as the blog plugin, that read the index files need to be updated. darcs-hash:20061112194900-2b9f0-a975498ccf0a1d39c6df73b79bcd028d5e81c389.gz
* update to previous changes cache patchchris2006-09-24
| | | | | | | | - fix potential array key collisions - restore ability to keep a minimum number ($conf['recent']) of recent changes irregardless of date of change darcs-hash:20060924162105-9b6ab-06350f04f9d9ac4c362f13787b682ef70887a1fc.gz
* fix for sitemap creation with new compression option #919Andreas Gohr2006-09-24
| | | | darcs-hash:20060924100606-7ad00-7e0bc1fa7778669ac352f8d8994acbb7517323cd.gz
* fix recent changes cache orderingchris2006-09-24
| | | | | | | | | | This patch fixes a bug in indexer.php which resulted in the order of the recent changes cache being reversed each time it was trimmed. It also adds sorting to both getRecents() and runTrimRecentChanges() as a defensive measure against the order of the file being corrupted. darcs-hash:20060923235109-9b6ab-0f4062c1b02449cce9382426174cd22d71387e5a.gz
* no gzipping in indexer.phpAndreas Gohr2006-09-17
| | | | darcs-hash:20060917140718-7ad00-ab1b95974ad63966c770f787112bc7c2e285c394.gz
* suppress boring errorsBen Coburn2006-09-08
| | | | | | | | Suppress any errors from set_time_limit, unlink, and file_exists functions. see: http://www.freelists.org/archives/dokuwiki/09-2006/msg00004.html darcs-hash:20060908193433-05dcb-013617431870ab5bfb2ce8c6e99ba5af13493228.gz
* scalable changelog redesignBen Coburn2006-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides a rewritten changelog system that is designed to run efficiently on both small and large wikis. The patch includes a plugin to convert changelogs from the current format. The conversion is non-destructive and happens automatically. For more information on the new changelog format see "http://wiki.splitbrain.org/wiki:changelog". Structure In short the changelog is now stored in per-page changelog files, with a recent changes cache. The recent changes cache is kept in "/data/meta/_dokuwiki.changes" and trimmed daily. The per-page changelogs are kept in "/data/meta/<ns>/<page_id>.changes" files. To preserve revision information for revisions stored in the attic, the "*.changes" files are not removed when their page is deleted. This allows the full life-cycle of page creation, deletion, and reversion to be tracked. Format The changelog line format now uses a general "line type" field in place of the special "minor" change syntax. There is also an extra field that can be used to store arbitrary data associated with special line types. The reverted line type (R) is a good example. There the extra field holds the revision date used as the source for reverting the page. See the wiki for the complete syntax description. Code Notes The changelog functions have been rewritten to load the whole file only if it is small. For larger files, the function loads only the relevant chunk(s). Parsed changelog lines are cached in memory to speed future function calls. getRevisionInfo A binary search is used to locate the chunk expected to contain the requested revision. The whole chunk is parsed, and adjacent lines are optimistically cached to speed consecutive calls. getRevisions Reads the changelog file backwards (newest first) in chunks until the requested number of lines have been read. Parsed changelog lines are cached for subsequent calls to getRevisionInfo. Because revisions are read from the changelog they are no longer guaranteed to exist in the attic. (Note: Even with lines of arbitrary length getRevisionInfo and getRevisions never split changelog lines while reading. This is done by sliding the "file pointer" forward to the end of a line after each blind seek.) isMinor Removed. To detect a minor edit check the type as follows: $parsed_logline['type'] darcs-hash:20060830182753-05dcb-1c5ea17f581197a33732a8d11da223d809c03506.gz