-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Description
Apache Lucene 9.4 will have support for Java 19 Panama APIs to mmap index files (using a MR-JAR). See apache/lucene#912 for more information.
As those APIs are not yet enabled by default in the JDK, we have to still use some opt-in approach, controlled by Java's command line:
- Lucene by default uses the old implementation using
MappedByteBufferand several hacks which may also risk crushing the JDK if an open index is closed from another thread while a search is running (this is well known). If Java 19 is detected, Lucene will log a warning through JUL whenMMapDirectoryis initialized (see below). - If you pass
--enable-previewto the Java command line (next to heap settings), it will enable preview APIs in JDK (https://openjdk.org/jeps/12). Lucene detects this and switchesMMapDirectoryand uses a new implementationMemorySegmentIndexInputfor the inputs to use those new APIs (at moment it will also log this as "info" message to JUL). The new APIs are safe and can no longer crush the JVM. But most importantly, all index files are now mapped in portions of 16 GiB instead of 1 GiB into memory. In fact, unless an index is force-merged to one segment, all index files will then consist only of one memory mapping spawning the whole file! This will help Hotspot to further optimize reading as only one implementation onMemorySegmentIndexInputist used. In addition, because the number of mappings is dramatically reduced (approximately 5 times less mappings, because the maximum segment size is 5 Gigabytes by default and all such segments now use one instead of 5 mappings). This may allow users to no longer change sysctl (seemax_map_count@ https://opensearch.org/docs/latest/opensearch/install/important-settings/) and go with defaults of OS. On the other hand users may host more indexes with many more segments on one node.
Some TODOs:
- Make sure that Opensearch also redirects stuff logged via
java.util.logging(JUL) to its own log file, so they do not land in console. This can be done with log4j by adding the log4j-jul adapter and install it using a system property in the Bootstrap classes. I have not checked if this is already done. The reason for this is that Apache Lucene now logs some events using java.util.logging since Lucene 9.0. Some of those events areMMapDirectorymessages (e.g., when unmapping was not working) or few others like some module system settings are incorrect. Logging is very seldom, but for this feature it will definitely log using JUL, so it would be good to make sure Opensearch redirects JUL logging correctly to its own loggers. This could be a separate issue! - The Opensearch startup script should pass
--enable-previewas command line flag if exactly Java 19 is used to start up Opensearch. If this is not done, a warning gets logged (see above).
Important: Lucene 9.4 only supports this on Java 19 (exactly), because the APIs are in flux. If you start with Java 20, it falls back to the classical MMapDirectory. We will add support for Java 20 in a later release. The reason for this is that the class files of new implementation are marked by some special version numbers that make them ONLY compatible to Java 19, not earlier or later, to allow the JDK to apply changes to the API before final release in Java 21. But passing --enable-preview to later versions won't hurt, so maybe enable it on all versions >= 19.
A last note: The downside of this new code is that closing and unmapping an index file gets more heavy (it will trigger an safepoint in the JVM). We have not yet found out how much this impacts servers opening/closing index files a lot. Because of this we would really like Amazon/Opensearch to do benchmarking on this, ideally if their users and customers could optionally enable it. But benchmarking should be done now, because with hopefully Java 21, Lucene will use the new implementation by default. Java 20 will be the second and last preview round.