summaryrefslogtreecommitdiff
path: root/lib/exe/indexer.php
Commit message (Collapse)AuthorAge
...
* don't use realpath() anymore (FS#1261 and others)Andreas Gohr2007-09-30
| | | | | | | | | | | The use of realpath() to clean up relative file names caused some trouble in certain setups relying on symlinks or having restricitve file structure setups. This patch replaces all realpath() calls with a PHP only replacement which should solve those problems. darcs-hash:20070930184250-7ad00-512ff04c95f57fc9eaf104f80372237a3c94286f.gz
* Remove obsolete words from search indexTom N Harris2007-09-19
| | | | | | | | | Creates another index file 'pagewords.idx' for the words in each page. Words that are deleted from a page can then be removed from the word index. The indexer version is incremented to force rebuilding of the index. Also, a minor flaw in the regexp for asian words is fixed. darcs-hash:20070919194244-6942e-2e08157dcf4fdf166b35b36a0faf8a3dfb415ad9.gz
* improved writability check for sitemap FS#1093Andreas Gohr2007-03-03
| | | | darcs-hash:20070303192836-7ad00-fe821c42ba7541f58ab52b9d8d11b3241bc90b65.gz
* workaround config for FS#852Andreas Gohr2007-02-08
| | | | | | | | | On certain platforms the ignore_user_abort function does not work as expected, resulting in a non working indexer webbug. Users with such a broken system (IIS+PHP as CGI) can enable this option to work around the problem (resulting in longer load times for the webbug). darcs-hash:20070208195145-7ad00-8fc14f9da535a70fa837066773e15a3926b077c7.gz
* string for constant fixBen Coburn2006-12-07
| | | | darcs-hash:20061207075815-05dcb-81fad7f4e40142e01f9f1aaa56f47fa51f978186.gz
* Indexer asian language fixes and speed-upsTom N Harris2006-11-17
| | | | | | | | | Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing faster. The indexes will expire on backend upgrades, so you don't have to delete *.indexed darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz
* sitemapper updateAndreas Gohr2006-11-17
| | | | | | | | | | | | | The Google sitemap protocol was recently adopted by Yahoo and Microsoft and made a common standard. This patch changes the XML namespace URL to the new sitmaps.org site and ups the version to 0.9 Pinging of Yahoo and Microsoft was added to the existing Google ping. The microsoft ping currently fails with a "Bad format" error for unknown reason. This will hopefully change when either Microsoft fixes their URL or releases some documentation. darcs-hash:20061117150030-7ad00-0fac1cba07926c3ffe687a8cbaf465e8de3abcd7.gz
* fixes for stricter php5 typing (bug#978)chris2006-11-13
| | | | darcs-hash:20061113122645-9b6ab-e5f5be2e88eea7eb00643e6a5210086f46191c30.gz
* Word-Length IndexerTNHarris2006-11-12
| | | | | | | | | | | | | | | | | A modification to the indexer that sorts words based on length. This should make searching a little bit more efficient. After the patch is applied, your old index will be automatically converted to the new format (when you visit a page). The new index format is: 1. Index files are stored in savedir/index 2. Word lists are stored as wlen.idx. This used to be word.idx. 3. Word indexes are stored as ilen.idx. This used to be index.idx. 4. The page list, page.idx, is simply copied to the new location. Any plugins you have, such as the blog plugin, that read the index files need to be updated. darcs-hash:20061112194900-2b9f0-a975498ccf0a1d39c6df73b79bcd028d5e81c389.gz
* update to previous changes cache patchchris2006-09-24
| | | | | | | | - fix potential array key collisions - restore ability to keep a minimum number ($conf['recent']) of recent changes irregardless of date of change darcs-hash:20060924162105-9b6ab-06350f04f9d9ac4c362f13787b682ef70887a1fc.gz
* fix for sitemap creation with new compression option #919Andreas Gohr2006-09-24
| | | | darcs-hash:20060924100606-7ad00-7e0bc1fa7778669ac352f8d8994acbb7517323cd.gz
* fix recent changes cache orderingchris2006-09-24
| | | | | | | | | | This patch fixes a bug in indexer.php which resulted in the order of the recent changes cache being reversed each time it was trimmed. It also adds sorting to both getRecents() and runTrimRecentChanges() as a defensive measure against the order of the file being corrupted. darcs-hash:20060923235109-9b6ab-0f4062c1b02449cce9382426174cd22d71387e5a.gz
* no gzipping in indexer.phpAndreas Gohr2006-09-17
| | | | darcs-hash:20060917140718-7ad00-ab1b95974ad63966c770f787112bc7c2e285c394.gz
* suppress boring errorsBen Coburn2006-09-08
| | | | | | | | Suppress any errors from set_time_limit, unlink, and file_exists functions. see: http://www.freelists.org/archives/dokuwiki/09-2006/msg00004.html darcs-hash:20060908193433-05dcb-013617431870ab5bfb2ce8c6e99ba5af13493228.gz
* scalable changelog redesignBen Coburn2006-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides a rewritten changelog system that is designed to run efficiently on both small and large wikis. The patch includes a plugin to convert changelogs from the current format. The conversion is non-destructive and happens automatically. For more information on the new changelog format see "http://wiki.splitbrain.org/wiki:changelog". Structure In short the changelog is now stored in per-page changelog files, with a recent changes cache. The recent changes cache is kept in "/data/meta/_dokuwiki.changes" and trimmed daily. The per-page changelogs are kept in "/data/meta/<ns>/<page_id>.changes" files. To preserve revision information for revisions stored in the attic, the "*.changes" files are not removed when their page is deleted. This allows the full life-cycle of page creation, deletion, and reversion to be tracked. Format The changelog line format now uses a general "line type" field in place of the special "minor" change syntax. There is also an extra field that can be used to store arbitrary data associated with special line types. The reverted line type (R) is a good example. There the extra field holds the revision date used as the source for reverting the page. See the wiki for the complete syntax description. Code Notes The changelog functions have been rewritten to load the whole file only if it is small. For larger files, the function loads only the relevant chunk(s). Parsed changelog lines are cached in memory to speed future function calls. getRevisionInfo A binary search is used to locate the chunk expected to contain the requested revision. The whole chunk is parsed, and adjacent lines are optimistically cached to speed consecutive calls. getRevisions Reads the changelog file backwards (newest first) in chunks until the requested number of lines have been read. Parsed changelog lines are cached for subsequent calls to getRevisionInfo. Because revisions are read from the changelog they are no longer guaranteed to exist in the attic. (Note: Even with lines of arbitrary length getRevisionInfo and getRevisions never split changelog lines while reading. This is done by sliding the "file pointer" forward to the end of a line after each blind seek.) isMinor Removed. To detect a minor edit check the type as follows: $parsed_logline['type'] darcs-hash:20060830182753-05dcb-1c5ea17f581197a33732a8d11da223d809c03506.gz
* check if ignore_user_abort was successful (maybe fix for #852)Andreas Gohr2006-07-01
| | | | darcs-hash:20060701120325-7ad00-07efe9cacd51043ad95d8d2d71d8680036721286.gz
* fixed google sitemap pinging #815Andreas Gohr2006-05-29
| | | | darcs-hash:20060529183003-7ad00-de0e3acac75a9f94f6c27f67651eeabe40411d7a.gz
* fix for sitemap creation #813Andreas Gohr2006-05-27
| | | | darcs-hash:20060526223358-7ad00-2bdfd39a5dd8ca09101288834cc75e5e963afda5.gz
* more info is gathered on metaupdate in background indexerAndreas Gohr2006-05-11
| | | | | | | | | | The background indexer now gathers info on contributors and modification dates from the changelog when adding the missing meta info. A new io_grep function was added which might be useful for other parts in the Wiki as well. darcs-hash:20060511191450-7ad00-baba1b48ea03b823c88a480862c612316f159b5a.gz
* metadata hnalding updates, header fixesAndreas Gohr2006-05-07
| | | | | | | | This removes the meta instruction again in favour of the new meta renderer. Most tests work now again, a few tweaks were done on the header handler to render certain headers as it did in earlier versions. darcs-hash:20060507153113-7ad00-bd299fbe1762482c72d109f9bca776f12bcea7c8.gz
* metadata enhancementsAndreas Gohr2006-04-30
| | | | | | | | | | | This adds meta data rendering to the indexer process to build missing meta data in the background. p_get_first_heading was adjusted to make use of the new meta data mechanisms A problem with unitialized arrays in p_set_metadata and PHP5 was fixed (I think) darcs-hash:20060430181740-7ad00-8952fa6beb4fadf6b4321627998442d34febfc8d.gz
* simplified file permission handlingAndreas Gohr2006-03-04
| | | | | | | | | | This patch simpliefies the configuration of the file and directory creation modes. There is no need to set the umask anymore. Only the wanted permissions for files and directories are set. An init function compares the wanted modes with the ones that would be choosen by the system automatically (consulting the system's umask) and sets the modes for chmod when needed. darcs-hash:20060304154038-7ad00-5ef1db3a87e42563a602f9d050c681d2ea74682f.gz
* Fix umask bug and do a code cleanup of chmod/mkdir usage so set the correct ↵Troels Liebe Bentsen2006-02-24
| | | | | | | | | | | permissions, this should also fix problems with dokuwiki making setuid files on some umasks. * Don't set the umask() anymore, this is not good form and we don't really know what is it in the old code anyway as it was not done properly. * Retire the dmask config option introduce 2 new ones called fmode and dmode, this is more in line with posix and should make more sense. * Use chmod for setting the correct permissions but only if it's needed. * Set changing of permissions off by default as i should work properly in most Apache setups without and it does not make sense on windows anyway. darcs-hash:20060224211655-ee6b9-68f7bb59417d6f0033cfd3764146923daa4dcf1b.gz
* Fix wrong umask usage and so we set the correct file premissions.Troels Liebe Bentsen2006-02-18
| | | | darcs-hash:20060218183251-ee6b9-798ab2994526311b1e58f04e7684b39b51426887.gz
* use usleep in locking to avoid 100% CPUErik Bystrm2006-01-15
| | | | darcs-hash:20060115105943-4b825-c15733992e9bbf26621d4431da3171bcb8d24057.gz
* more debugging code in indexer.phpAndreas Gohr2005-12-07
| | | | darcs-hash:20051207193507-7ad00-5b9c870ae153b1c7adc8360822ffc6216be98809.gz
* added debug options to indexer.php for sitemap stuffAndreas Gohr2005-12-03
| | | | darcs-hash:20051203142519-7ad00-d72a5e338ecda2b819e8628444d2262d7458b8e2.gz
* automatic google ping after sitemap updateAndreas Gohr2005-11-29
| | | | darcs-hash:20051129223742-7ad00-2b17207795c195d7194578007ef19029f0ed0f94.gz
* fixed date format for google sitemapsAndreas Gohr2005-11-27
| | | | darcs-hash:20051127110118-7ad00-691b4d529004ef0571896c3d326361970a584409.gz
* Added Google sitemap support #371Andreas Gohr2005-11-27
| | | | | | | | | | This patch addes the automatic creation of Google sitemaps. The map is created in the DokuWiki root dir and named sitemap.xml.gz if gzip compression is available - if not the gz extion is skipped. How often the map is recreated is defined through the $conf['sitemap'] option. It accepts a day value. darcs-hash:20051126234709-7ad00-6ff4b0e79670cdfa39e615ec9dc40146ffcc9dd4.gz
* added command line utility to update the indexAndreas Gohr2005-10-16
| | | | darcs-hash:20051016001228-7ad00-5f9c0176e9d9830ec22332504e7d415bd4a20a1d.gz
* indexer_cleanid_patchhfuecks2005-10-15
| | | | darcs-hash:20051015203821-e96b6-907a58698b3b566f0997f8ef58e1259abff769cc.gz
* load indexing include only when neededAndreas Gohr2005-10-06
| | | | darcs-hash:20051006174104-7ad00-4abd8894c1449a46467c0d168e7fc5e90331024c.gz
* indexer_patch_flush_imagehfuecks2005-10-06
| | | | darcs-hash:20051006130651-e96b6-6496b235c56a40cdea06df6198a5d39e5bfa9d13.gz
* mkdir compatibility fix in indexer #575Andreas Gohr2005-09-30
| | | | darcs-hash:20050930151407-7ad00-56002a89c36a82a249de577227929ace91ebad2f.gz
* small indexer fixAndreas Gohr2005-09-13
| | | | | | The indexer didn't create the last indexed files correctly darcs-hash:20050913184004-7ad00-8756a7362942c747d53992fa8f0ee4da5534badb.gz
* added missing indexer fileAndreas Gohr2005-08-23
darcs-hash:20050823163450-7ad00-5ed5b87ee1898281090bb3170498866dbc18cb24.gz