summaryrefslogtreecommitdiff
path: root/inc/utf8.php
Commit message (Collapse)AuthorAge
* Added class-exitstsSC Yoo2015-02-20
|
* Normalization is required to manage multibyte characters.SC Yoo2015-02-17
| | | | | | | The OSX uses Unicode-NFD so normalization is required to manage multibyte characters. ( http://unicode.org/reports/tr15/ ) If don't do that, DokuWiki can't find the file uploaded from OS X with multibyte filename like '도쿠위키.jpg'
* PHPDocs and some improvementsGerrit Uitslag2014-10-02
|
* more PHPDocs, unused var, small bit code reformattingGerrit Uitslag2014-10-01
|
* check for unicode preg capabilities in UTF-8 lib FS#2636Andreas Gohr2012-11-12
| | | | | We now have two defines for checking for UTF-8 and Unicode property support in PREG and use them to work around FS#2636 on older systems.
* more utf8_basename fixesAndreas Gohr2012-07-29
|
* Fix utf8_basename for files in the root directoryMichael Hamann2012-07-29
|
* fix utf8_basename for file names without any directoryAndreas Gohr2012-07-29
|
* added utf8_basename()Andreas Gohr2012-07-28
| | | | | | | | | This is a locale independent version of basename to work around https://bugs.php.net/bug.php?id=37738 The function is not yet used anywhere. It should be at least used where ever non-ASCII filenames and paths are handled. Simply replacing all calls to basename() with this function might be the safest.
* some phpdoc updatesAndreas Gohr2012-06-23
|
* coding style updatesAndreas Gohr2012-03-16
|
* FS#2291 replace space with non-breaking space in utf8 special charsChristopher Smith2011-10-15
|
* Transliteration for Sanskrit diacritics FS#2246Eivind Morland2011-08-01
|
* Corrected contact email of Andreas HaerterMichael Hamann2010-09-18
|
* Clariefied license (clean version)Andreas Gohr2010-06-21
|
* Revert "inc/utf8.php license clarified for Debian project"Andreas Gohr2010-06-21
| | | | | | | This was an edit through the github interface which changed more than intented. This reverts commit 1720a8e9a67df95c104eb02146c98a3d9da1f84b.
* inc/utf8.php license clarified for Debian projectCosmoCode GmbH2010-06-21
|
* new fnencode option FS#1649Andreas Gohr2010-04-04
| | | | | | This patch adds an option to choose how filenames are encoded when saved to the file system. You can choose between urlencoding (url), the new SafeFn method (safe) and storing real UTF-8 (utf-8).
* Coding Standard CleanupAndreas Gohr2009-10-20
| | | | | | Ignore-this: 259cb5773c3144c6c706d87298dcf674 darcs-hash:20091020212338-7ad00-6bf1c5c403491f136a1c02af5ecd9f84d7227107.gz
* do not recalculate strlen in each loop in utf8 libAndreas Gohr2009-10-20
| | | | | | Ignore-this: 2e2f6983f0c1b891825b0c1954b7727d darcs-hash:20091020201938-7ad00-7b5501c2acc9f5ac280e73d25e1cccbcb3237356.gz
* Whitespace cleanup FS#1709furun2009-10-16
| | | | | | Ignore-this: 27ea52110bce929b2c61ed8faba67cfc darcs-hash:20091016205526-c0bf4-35eba4e65d37980a667ba982f7f1ea5b7b07f01c.gz
* rollback of the rollback... yes reallyAndreas Gohr2009-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | Ignore-this: 77a6ae8bee651ddb193e0ed84cbe3667 Okay, so it turned out Chris' test had a bug and wasn't really testing and my test was skewed by disk caching (remember: always run your performance tests multiple times). rolling back: Sat Jul 25 12:44:59 CEST 2009 Andreas Gohr <andi@splitbrain.org> * rollback of the utf8_isASCII() patch Tests showed the old code was faster and I was too stupid to read the test results rolling back: Fri Jul 24 10:40:09 CEST 2009 Andreas Haerter <netzmeister@andreas-haerter.de> * Much faster version of utf8_isASCII() This version uses a non-capturing regular expression instead of looping through all characters of the string. M ./inc/utf8.php -5 +2 M ./inc/utf8.php -2 +5 darcs-hash:20090726191841-7ad00-13950d9c528abd51f5680c6841ec738a0ee72130.gz
* rollback of the utf8_isASCII() patchAndreas Gohr2009-07-25
| | | | | | | | | | | | | | | | Ignore-this: e5afeb833d0e0b0bf05ff5f497a3130d Tests showed the old code was faster and I was too stupid to read the test results rolling back: Fri Jul 24 10:40:09 CEST 2009 Andreas Haerter <netzmeister@andreas-haerter.de> * Much faster version of utf8_isASCII() This version uses a non-capturing regular expression instead of looping through all characters of the string. M ./inc/utf8.php -5 +2 darcs-hash:20090725104459-7ad00-c4849ca67293083fee8021c2c198dab1dcb435a2.gz
* Much faster version of utf8_isASCII()Andreas Haerter2009-07-24
| | | | | | | | Ignore-this: 1adbc2b33e76b3a76e650c340e9644e6 This version uses a non-capturing regular expression instead of looping through all characters of the string. darcs-hash:20090724084009-2cb76-ad1630c7aca53f0bdb596525b0693304a9b4cc88.gz
* function_exists checks in utf8 lib for compatibility with 3rd party libsAndreas Gohr2009-01-14
| | | | | | | | | | | | | | | | | | The DokuWiki UTF-8 library and its derivates are very popular in Open Source PHP software. This makes trouble due to name clashes, when 3rd party software libraries need to be loaded within DokuWiki. A common example is using the authentication libraries of popular forum systems (PHP3, PunBB3). With the checks added, DokuWiki will rely on the 3rd party UTF-8 functions instead of its own ones. As long as they are really the same, this will work. Users of 3rd party libs need to check compatibility between implementations themselves. darcs-hash:20090114201824-7ad00-40fcc2e1abec42adabef5596a6617fbaa22291d5.gz
* strip special spaces FS#1539Andreas Gohr2009-01-12
| | | | darcs-hash:20090112193617-7ad00-824d6a71ca9b5c067fa09e58daf915473f361ed8.gz
* more placeholders for namespace templatesAndreas Gohr2008-10-26
| | | | | | | | | | | This patch adds a @FILE@ placeholder for namespace templates which is similar to the @PAGE@ placeholder but keeps underscores intact. It also adds placeholder to insert the page name with a first uppercase character, all words uppercased or the whole string uppercased. The utf8 library was enhanced with utf8_ucfirst and utf8_ucwords functions darcs-hash:20081026084239-7ad00-1a4be6bb85280df025ca308d4ed2e50da1cbc9cf.gz
* do not treat greek as special chars FS#1492Andreas Gohr2008-10-12
| | | | | | | Will treat only mathematical greek as special. Changed toolbar picker to use mathematical symbols. darcs-hash:20081012153950-7ad00-a2a4e8cf705aff689d405ccb4015f1b75a0045cf.gz
* Do romanization of certain characters different from what deaccent does FS#1117Andreas Gohr2008-10-11
| | | | | | | | Some characters are deaccented/romanized different in different languages, we now do one way in deaccent and the other way in romanize. Giving the user a choice what she prefers. (Currently affects a handful scandinavian letters). darcs-hash:20081011091034-7ad00-08535e03639b0b0c634e2438609ac10545f14f48.gz
* Last fixes for Japanese Romanization. Now all 22893 tests succeed.Andreas Gohr2008-06-08
| | | | darcs-hash:20080608113523-7ad00-81e25091d59c2333f4f82f1cf61321155b03f895.gz
* Japanese romanization updateAndreas Gohr2008-05-08
| | | | | | Down to 57 fails darcs-hash:20080508212444-7ad00-16286e9f5be2bbbd3069d5c22ab8c270b2e1b23e.gz
* Updates for Japanese romanization support FS#1363Andreas Gohr2008-05-06
| | | | | | | This patch adds some fixes for the romanization lookup table and a test case for more than 20000 phrases and their correct romanization. About 2100 tests currently fail. darcs-hash:20080506203707-7ad00-9d95b8af459fa44c8d3e95560c7e1c116b8ffc48.gz
* Fixes for Japanese romanization FS#1363Denis Scheither2008-04-07
| | | | darcs-hash:20080407174238-84fef-88cae1548503760595a19f00e03060604303b934.gz
* utf8_trim bugfixAndreas Gohr2007-11-02
| | | | | | Fixes the utf8_trim() function when a charlist is given darcs-hash:20071102181430-7ad00-4160d3d47b53e9c0db76328004c1f95c76d590e6.gz
* fixed Thai romanizationAndreas Gohr2007-10-15
| | | | darcs-hash:20071015170603-7ad00-cce18a874fa1857af1717519cac14e86f986c7f2.gz
* removed unnessary UTF-8 replacement functionsAndreas Gohr2007-07-19
| | | | darcs-hash:20070719130041-7ad00-84d00f6385973e6f2f9499374c3c1d475eecb715.gz
* several speed improvements in UTF-8 libAndreas Gohr2007-07-19
| | | | darcs-hash:20070719110142-7ad00-1192e190c62637ed68e2c2c0a0b3135abfd6ecb5.gz
* Escape Ctrl-Z so darcs stops treating utf8.php as binary.Tom N Harris2007-03-23
| | | | darcs-hash:20070323030243-6942e-836105b95078b213df8497386ae9b0418fcf29be.gz
* Encode/Decode numeric HTML entities correctly.Tom N Harris2007-02-02
| | | | | | | utf8_tohtml handles all codepoints, and the inverse function, utf8_unhtml, is added. darcs-hash:20070202070509-6942e-09ed9dc37f1469055a7c04d44044768e160d60e6.gz
* tf_rename_lower.patchhenning.noren2007-01-03
| | | | | | | Name the TRUE/FALSE-constants consistently as lowercase everywhere. This might also be an tiny optimization in some environments. darcs-hash:20070103205700-d2a3e-e7ec0aedb938d563f583116a2d5b17f3a3fea36c.gz
* Indexer asian language fixes and speed-upsTom N Harris2006-11-17
| | | | | | | | | Make Chinese and Japanese work better with the new indexer. Some missing punctuation added to utf8_stripspecials. Misc. other changes to make indexing faster. The indexes will expire on backend upgrades, so you don't have to delete *.indexed darcs-hash:20061117123032-6942e-774b38e08234928c49b37e40addba375acf67ac0.gz
* do not transliterate cyrillic soft sign #958Andreas Gohr2006-10-28
| | | | darcs-hash:20061028113426-7ad00-f1d6b3b919c3aadd2bd7585fb772071b81b4b42d.gz
* more utf8_substr improvements (re FS#891 and yesterday's patch)chris2006-09-28
| | | | | | | - rework utf8_substr() NOMBSTRING code to always use pcre - remove work around for utf8_substr() and large strings from ft_snippet() darcs-hash:20060928165122-9b6ab-0eefc216f07f9d7e7d8eb62ce26605c28ee340fa.gz
* utf8_substr fix for FS#891chris2006-09-27
| | | | darcs-hash:20060927033713-9b6ab-4b35e0a85b6d11d5a3a98858cd2f860b383ff153.gz
* utf8_stripspecials optimizationchris2006-09-23
| | | | | | | | | | | | | | | Add preconverted utf-8 string of special characters. The (once only) conversion of the special character unicode array into utf-8 occurs on every DokuWiki page view, irrespective of action or caching, and takes about one third of the time involved in delivering a wiki page straight from cache. The original unicode array has been left in place in the file to make any future amendments easier. darcs-hash:20060923151937-9b6ab-cae0340a95d9596415ef71d7b7e67ef9daca84ef.gz
* search improvementschris2006-08-31
| | | | | | | | | | | | | | | | | | | ft_snippet() - make utf8 algorithm default - add workaround for utf8_substr() limitations, bug #891 - fix some indexes which missed out on conversion to utf8 character counts - minor improvements idx_lookup() - minor changes to wildcard matching code to improve performance (changes based on profiling results) utf8 - specifically set mb_internal_coding to utf-8 when mb_string functions will be used. darcs-hash:20060831003413-9b6ab-712021eda3c959ffe79d8d3fe91d2c9a8acf2b58.gz
* further update to global memory cache arrayschris2006-08-29
| | | | | | | | | | | | | - remove initialisation of caches in inc/pageutils.php - add global declaration to init.php to support init.php being included from within a function, e.g. unit testing ;-) - minor change to utf8_substr, remove non-essential brackets added as part of an earlier patch darcs-hash:20060829134806-9b6ab-ab15191344a83be664c412403dc84a24fa2253a2.gz
* utf8_substr() fix, it wasn't using mb_substr results when availablechris2006-08-28
| | | | darcs-hash:20060828092029-9b6ab-f76c94b76ce1ada49e2fefde11af824bb98b99c7.gz
* utf8_correctIdx bounds checking and more unittestschris2006-08-27
| | | | darcs-hash:20060827153254-9b6ab-3c76fde7cb5534ca12628e9aa6e6d59d9bb02f45.gz
* ft_snippet() update, fix utf8 problemschris2006-08-26
| | | | darcs-hash:20060826095311-9b6ab-9a6f272cc7c7532eb2bad8f7b4404c5a16b71109.gz