Elasticsearch¶
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly.
eXo Platform supports two deployment modes of Elastic search:
Embedded mode: One node of Elasticsearch embedded in each eXo Platform instance.
External mode: eXo Platform (deployed in standalone or cluster mode) is connected to an external Elasticsearch (deployed in standalone or cluster mode).
Note
With eXo Platform 4.4, the embedded mode is bundled by default with the platform as an add-on.
Note
The default Mimetypes list which content is indexed by default: “text/., application/ms. , application/vnd.* , application/xml , application/excel , application/powerpoint , application/xls, application/ppt , application/pdf , application/xhtml+xml , application/javascript , application/x-javascript , application/x-jaxrs+groovy , script/groovy”, this list can be re-defined in exo.properties file by adding the following parameter exo.unified-search.indexing.supportedMimeTypes=NEW-LIST (More information in Search connector configuration )
Note
Max allowed Mimetype file size to be indexed is by default : 20 MB, a new Max size value can be re-defined in exo.properties file by adding the following parameter exo.unified-search.indexing.file.maxSize=xx (More information in Search connector configuration )
This chapter covers the following topics:
Elasticsearch configuration for embedded mode Configuration for Elasticsearch embedded mode.
Elasticsearch Configuration for external mode Configuration for Elasticsearch external mode.
Elasticsearch Indexing architecture Indexing Architecture
Elasticsearch embedded mode¶
An Elasticsearch node is embedded in the eXo Platform server (and is hosted in the same JVM).
The Elasticsearch node is declared as:
Master: To manage the cluster with only one node.
Data: To index and store documents.
Client: To serve and coordinate requests from the platform.
By default:
The parameter
es.cluster.name
of the Elasticsearch cluster is exoplatform-es.The parameter
es.network.host
is set to 127.0.0.1. This prevents accesses from IP other than localhost and prevents other nodes to join the ES cluster.The parameter `` es.http.port`` is set by default to the port 9200: Elasticseach is bound to port 9200 for HTTP connections.
Note
Elasticsearch Embedded mode properties are configurable through exo.properties file. More details could be found here. It is also possible to override Elasticsearch embedded mode configuration by using this property in the server startup:
-Dexo.es.embedded.configuration.file=/absolute/path/to/file
Where /absolute/path/to/file
is the absolute path the the yml configuration file.
Elasticsearch external mode¶
With the external mode, Elasticsearch nodes are not embedded in eXo Platform server, eXo Platform connects to the external Elasticsearch node or cluster.
To use the external mode, you need to uninstall the embedded mode using this command:
./addon uninstall meeds-es-embedded
or simply disable it in exo.properties by setting to false the property exo.es.embedded.enabled:
exo.es.embedded.enabled=false
The following plugin must be installed on Elasticsearch instance:
Note
We highly recommend to use Elasticsearch 5.6 version.
As for embedded mode, some parameters should be configured for the external mode through exo.properties file:
exo.es.search.server.url
: The URL of the node used for searching.exo.es.search.server.username
: The username used for BASIC authentication on the Elasticseach node used for searching.exo.es.search.server.password
: The password used for BASIC authentication on the Elasticseach node used for searching.exo.es.index.server.url
: The URL of the node used for indexing.exo.es.index.server.username
: The username used for the BASIC authentication on the Elasticsearch node used for indexing.exo.es.index.server.password
: The password used for the BASIC authentication on the Elasticsearch node used for indexing.
You can find more details about the above parameters, default values and description in Properties reference table.
Elasticsearch Indexing architecture¶
Indexes
An index in Elasticsearch is like a table in a relational database. It has a mapping which defines the fields in the index, which are grouped by multiple type. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.
Learn more about indexing in Elasticsearch here.
With eXo Platform and Elasticsearch, an index is dedicated to each application (Wiki, Calendar, Documents…). All the application data (for example wiki application data: wiki, wiki page, wiki attachment) will be indexed in the same index.
Sharding
A shard is a single Lucene instance. It is a low-level worker unit which is managed automatically by Elasticsearch.
Learn more about Sharding in Elasticsearch here.
In eXo Platform with Elasticsearch:
Sharding will only be used for horizontal scalability.
eXo Platform does not use routing policies to route documents or documents type to a specific shard.
The default number of shards is 5: the default value of Elasticsearch.
This value is configurable per index by setting the parameter
shard.number
in the constructor parameters of the connectors.
Replicas
Each index can be replicated over the Elasticsearch cluster.
The default number of replicas is 1 (the default value of Elasticsearch) which means one replica for each primary shard.
This value is configurable per index by setting the parameter
replica.number
in the constructor parameters of the connectors.