Elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly.

eXo Platform supports two deployment modes of Elastic search:

  • Embedded mode: One node of Elasticsearch embedded in each eXo Platform instance.

  • External mode: eXo Platform (deployed in standalone or cluster mode) is connected to an external Elasticsearch (deployed in standalone or cluster mode).

image0

Note

With eXo Platform 4.4, the embedded mode is bundled by default with the platform as an add-on.

Note

The default Mimetypes list which content is indexed by default: “text/., application/ms. , application/vnd.* , application/xml , application/excel , application/powerpoint , application/xls, application/ppt , application/pdf , application/xhtml+xml , application/javascript , application/x-javascript , application/x-jaxrs+groovy , script/groovy”, this list can be re-defined in exo.properties file by adding the following parameter exo.unified-search.indexing.supportedMimeTypes=NEW-LIST (More information in Search connector configuration )

Note

Max allowed Mimetype file size to be indexed is by default : 20 MB, a new Max size value can be re-defined in exo.properties file by adding the following parameter exo.unified-search.indexing.file.maxSize=xx (More information in Search connector configuration )

This chapter covers the following topics:

Elasticsearch embedded mode

An Elasticsearch node is embedded in the eXo Platform server (and is hosted in the same JVM).

The Elasticsearch node is declared as:

  • Master: To manage the cluster with only one node.

  • Data: To index and store documents.

  • Client: To serve and coordinate requests from the platform.

By default:

  • The parameter es.cluster.name of the Elasticsearch cluster is exoplatform-es.

  • The parameter es.network.host is set to 127.0.0.1. This prevents accesses from IP other than localhost and prevents other nodes to join the ES cluster.

  • The parameter `` es.http.port`` is set by default to the port 9200: Elasticseach is bound to port 9200 for HTTP connections.

Note

Elasticsearch Embedded mode properties are configurable through exo.properties file. More details could be found here. It is also possible to override Elasticsearch embedded mode configuration by using this property in the server startup:

-Dexo.es.embedded.configuration.file=/absolute/path/to/file

Where /absolute/path/to/file is the absolute path the the yml configuration file.

Elasticsearch external mode

With the external mode, Elasticsearch nodes are not embedded in eXo Platform server, eXo Platform connects to the external Elasticsearch node or cluster.

To use the external mode, you need to uninstall the embedded mode using this command:

./addon uninstall meeds-es-embedded

or simply disable it in exo.properties by setting to false the property exo.es.embedded.enabled:

exo.es.embedded.enabled=false

The following plugin must be installed on Elasticsearch instance:

Note

We highly recommend to use Elasticsearch 5.6 version.

As for embedded mode, some parameters should be configured for the external mode through exo.properties file:

  • exo.es.search.server.url: The URL of the node used for searching.

  • exo.es.search.server.username: The username used for BASIC authentication on the Elasticseach node used for searching.

  • exo.es.search.server.password: The password used for BASIC authentication on the Elasticseach node used for searching.

  • exo.es.index.server.url: The URL of the node used for indexing.

  • exo.es.index.server.username: The username used for the BASIC authentication on the Elasticsearch node used for indexing.

  • exo.es.index.server.password: The password used for the BASIC authentication on the Elasticsearch node used for indexing.

You can find more details about the above parameters, default values and description in Properties reference table.

Elasticsearch Indexing architecture

Indexes

An index in Elasticsearch is like a table in a relational database. It has a mapping which defines the fields in the index, which are grouped by multiple type. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

Learn more about indexing in Elasticsearch here.

With eXo Platform and Elasticsearch, an index is dedicated to each application (Wiki, Calendar, Documents…). All the application data (for example wiki application data: wiki, wiki page, wiki attachment) will be indexed in the same index.

Sharding

A shard is a single Lucene instance. It is a low-level worker unit which is managed automatically by Elasticsearch.

Learn more about Sharding in Elasticsearch here.

In eXo Platform with Elasticsearch:

  • Sharding will only be used for horizontal scalability.

  • eXo Platform does not use routing policies to route documents or documents type to a specific shard.

  • The default number of shards is 5: the default value of Elasticsearch.

  • This value is configurable per index by setting the parameter shard.number in the constructor parameters of the connectors.

Replicas

  • Each index can be replicated over the Elasticsearch cluster.

  • The default number of replicas is 1 (the default value of Elasticsearch) which means one replica for each primary shard.

  • This value is configurable per index by setting the parameter replica.number in the constructor parameters of the connectors.