* feat: improve search tokenization for CJK languages
Enhance the encoder function to properly tokenize CJK (Chinese, Japanese,
Korean) characters while maintaining English word tokenization. This fixes
search issues where CJK text was not searchable due to whitespace-only
splitting.
Changes:
- Tokenize CJK characters (Hiragana, Katakana, Kanji, Hangul) individually
- Preserve whitespace-based tokenization for non-CJK text
- Support mixed CJK/English content in search queries
This addresses the CJK search issues reported in #2109 where Japanese text
like "て以来" was not searchable because the encoder only split on whitespace.
Tested with Japanese, Chinese, and Korean content to verify character-level
tokenization works correctly while maintaining English search functionality.
* perf: optimize CJK search encoder with manual buffer tracking
Replace regex-based tokenization with index-based buffer management.
This improves performance by ~2.93x according to benchmark results.
- Use explicit buffer start/end indices instead of string concatenation
- Replace split(/\s+/) with direct whitespace code point checks
- Remove redundant filter() operations
- Add CJK Extension A support (U+20000-U+2A6DF)
Performance: ~878ms → ~300ms (100 iterations, mixed CJK/English text)
* test: add comprehensive unit tests for CJK search encoder
Add 21 unit tests covering:
- English word tokenization
- CJK character-level tokenization (Japanese, Korean, Chinese)
- Mixed CJK/English content
- Edge cases
All tests pass, confirming the encoder correctly handles CJK text.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* chore(deps): update flexsearch to version 0.8.205 and adjust search encoder.
* refactor(search): enhance search encoder and update search results type
- Improved the encoder function to filter out empty tokens.
- Updated the search results type from a specific FlexSearch type to a more generic 'any' type for flexibility.
- Removed redundant rtl property from the index configuration.
* refactor(search): remove rtl property from search index configuration
* refactor(search): improve encoder function formatting
- Updated the encoder function to use consistent arrow function syntax for better readability.
* refactor(search): update search results type to DefaultDocumentSearchResults
- Imported DefaultDocumentSearchResults from FlexSearch for improved type safety.
- Changed the type of searchResults from 'any' to DefaultDocumentSearchResults<Item> for better clarity and maintainability.
* Use a `<button>` for search
* Fix search button styles to match preexisting styling
* Remove additional native button properties.
* Invoke search button on click or keyboard.
* Reorganize search button DOM hierarchy
* Restore focus to the search button when exiting the search overlay
* Run prettier on Search.tsx
* feat(search): add search by title/content index and tag at the same time
* fix(search): set search type to basic and remove tag from term for proper highlightning and scroll when searched by tag and title/content index
* fix(search): use indexOf to find space so it is easier to read
* fix(search): trim trailing whitespaces before splitting
* fix(search): set limit to 10000 for combined search mode (to make filter by tag more accurate)
* fix: alt error mix with height/width
More granular detection of alt and resize in image
* fix: format
* feat: init i18n
* feat: add translation
* style: prettier for test
* fix: build-up the locale to fusion with dateLocale
* style: run prettier
* remove cursed file
* refactor: remove i18n library and use locale way instead
* format with prettier
* forgot to remove test
* prevent merging error
* format
* format
* fix: allow string for locale
- Check during translation if valid / existing locale
- Allow to use "en" and "en-US" for example
- Add fallback directly in the function
- Add default key in the function
- Add docstring to cfg.ts
* forgot item translation
* remove unused locale variable
* forgot to remove fr-FR testing
* format
* feat(search): add arrow navigation
* chore: format
* refactor: simplify arrow navigation
* chore: remove comment
* feat: rework arrow navigation to work without state
* feat: make pressing enter work with arrow navigation
* fix: remove unused css class
* chore: correct comment
* refactor(search): use optional chaining
* Quartz sync: Aug 29, 2023, 10:17 PM
* style: add basic style to tags in search
* feat: add SearchType + tags to search preview
* feat: support multiple matches
* style(search): add style to matching tags
* feat(search): add content to preview for tag search
* fix: only display tags on tag search
* feat: support basic + tag search
* refactor: extract common `fillDocument`, format
* feat: add hotkey to search for tags
* chore: remove logs
* fix: dont render empty `<ul>` if tags not present
* fix(search-tag): make case insensitive
* refactor: clean `hideSearch` and `showSearch`
* feat: trim content similar to `description.ts`
* fix(search-tag): hotkey for windows
* perf: re-use main index for tag search