summaryrefslogtreecommitdiff
path: root/fs/unicode/mkutf8data.c
AgeCommit message (Collapse)Author
2024-12-14Revert "unicode: Don't special case ignorable code points"Linus Torvalds
[ Upstream commit 231825b2e1ff6ba799c5eaf396d3ab2354e37c6b ] This reverts commit 5c26d2f1d3f5e4be3e196526bead29ecb139cf91. It turns out that we can't do this, because while the old behavior of ignoring ignorable code points was most definitely wrong, we have case-folding filesystems with on-disk hash values with that wrong behavior. So now you can't look up those names, because they hash to something different. Of course, it's also entirely possible that in the meantime people have created *new* files with the new ("more correct") case folding logic, and reverting will just make other things break. The correct solution is to not do case folding in filesystems, but sadly, people seem to never really understand that. People still see it as a feature, not a bug. Reported-by: Qi Han <hanqi@vivo.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586 Cc: Gabriel Krisman Bertazi <krisman@suse.de> Requested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-09unicode: Don't special case ignorable code pointsGabriel Krisman Bertazi
We don't need to handle them separately. Instead, just let them decompose/casefold to themselves. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-06-20unicode: add MODULE_DESCRIPTION() macrosJeff Johnson
Currently 'make W=1' reports: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8data.o WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8-selftest.o Add a MODULE_DESCRIPTION() to utf8-selftest.c and utf8data.c_shipped, and update mkutf8data.c to add a MODULE_DESCRIPTION() to any future generated utf8data file. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240524-md-unicode-v1-1-e2727ce8574d@quicinc.com Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2021-10-12unicode: Add utf8-data moduleChristoph Hellwig
utf8data.h contains a large database table which is an auto-generated decodification trie for the unicode normalization functions. Allow building it into a separate module. Based on a patch from Shreeya Patel <shreeya.patel@collabora.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2019-04-28unicode: refactor the rule for regenerating utf8data.hMasahiro Yamada
scripts/mkutf8data is used only when regenerating utf8data.h, which never happens in the normal kernel build. However, it is irrespectively built if CONFIG_UNICODE is enabled. Moreover, there is no good reason for it to reside in the scripts/ directory since it is only used in fs/unicode/. Hence, move it from scripts/ to fs/unicode/. In some cases, we bypass build artifacts in the normal build. The conventional way to do so is to surround the code with ifdef REGENERATE_*. For example, - 7373f4f83c71 ("kbuild: add implicit rules for parser generation") - 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped") I rewrote the rule in a more kbuild'ish style. In the normal build, utf8data.h is just shipped from the check-in file. $ make [ snip ] SHIPPED fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a If you want to generate utf8data.h based on UCD, put *.txt files into fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line. The mkutf8data tool will be automatically compiled to generate the utf8data.h from the *.txt files. $ make REGENERATE_UTF8DATA=1 [ snip ] HOSTCC fs/unicode/mkutf8data GEN fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a I renamed the check-in utf8data.h to utf8data.h_shipped so that this will work for the out-of-tree build. You can update it based on the latest UCD like this: $ make REGENERATE_UTF8DATA=1 fs/unicode/ $ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped Also, I added entries to .gitignore and dontdiff. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>