Posted in

在 Java 中实现 Elasticsearch 聚合查询_AI阅读总结 — 包阅AI

包阅导读总结

1. 关键词:Elasticsearch、Aggregation Query、Java、Spring Data、Product

2. 总结:

本文介绍了在 Java 中使用 Spring Data Elasticsearch 实现 Elasticsearch 聚合查询,包括聚合的概念和工作原理,创建索引、添加文档的示例,项目的搭建步骤(如设置依赖、配置、创建模型和仓库等),还给出了添加聚合的服务类实现。

3. 主要内容:

– 理解聚合

– 聚合是 Elasticsearch 用于复杂数据分析的工具,可分组处理数据计算统计等。

– 以管理产品实体的示例应用展示按类别聚合文档数量。

– 介绍通过 curl 命令创建索引和添加文档,及验证文档添加成功。

– 解释聚合工作原理和带聚合的查询基本结构。

– 项目设置

– 在 pom.xml 中添加必要依赖。

– 创建配置类设置 Elasticsearch。

– 定义代表文档的 Product 类。

– 创建返回按类别分组产品数量的仓库接口。

– 在服务类中添加聚合实现,计算每个类别的产品数量。

思维导图:

文章地址:https://www.javacodegeeks.com/implementing-an-elasticsearch-aggregation-query-in-java.html

文章来源:javacodegeeks.com

作者:Omozegie Aziegbe

发布时间:2024/8/12 13:01

语言:英文

总字数:1518字

预计阅读时间:7分钟

评分:82分

标签:Elasticsearch,Java,Spring Boot,数据分析,REST API


以下为原文内容

本内容来源于用户推荐转载,旨在分享知识与观点,如有侵权请联系删除 联系邮箱 media@ilingban.com

Elasticsearch is a distributed search engine that is designed for scalability, flexibility, and high performance. It allows us to store, search, and analyze large volumes of data quickly. One of its key features is the capability to perform aggregations, allowing us to conduct complex data analysis and extract statistical insights from our indexed data. In this article, we will walk through how to add an aggregation to an Elasticsearch query using Spring Data Elasticsearch.

1. Understanding Aggregations

Aggregations in Elasticsearch are tools that enable us to perform complex data analysis and generate insightful metrics on our indexed data. They work by processing and grouping our data in various ways, allowing us to compute statistics, histograms, and other advanced analytics.

1.1 Sample Application Overview

The sample application used in this article is designed to manage a collection of Product entities, which include fields such as id, name, category, and price. The application’s primary focus is to demonstrate how to perform aggregation queries in Elasticsearch using Spring Data, specifically how to count the number of documents per category.

To create an index named product-items in Elasticsearch and add a few documents to it, we will use the curl command to interact with Elasticsearch’s REST API.

First, create the product-items index.

curl -X PUT "localhost:9200/product-items" -H 'Content-Type: application/json' -d'{  "settings": {    "number_of_shards": 1,    "number_of_replicas": 1  }}'

This command will create an index named product-items with one shard and one replica.

Next, add a few documents to the product-items index. Each document will represent a product item, with fields like name, price, and category.

Document 1

curl -X POST "localhost:9200/product-items/_doc/1" -H 'Content-Type: application/json' -d'{  "name": "Laptop",  "price": 999.99,  "category": "Electronics"}'

Document 2

curl -X POST "localhost:9200/product-items/_doc/2" -H 'Content-Type: application/json' -d'{  "name": "Smartphone",  "price": 699.99,  "category": "Electronics"}'

Document 3

curl -X POST "localhost:9200/product-items/_doc/3" -H 'Content-Type: application/json' -d'{  "name": "Vans",  "price": 399.99,  "category": "Clothing"}'

To verify that the documents were added successfully, we can retrieve them using a search query:

curl -X GET "localhost:9200/product-items/_search" -H 'Content-Type: application/json' -d'{  "query": {    "match_all": {}  }}'

1.2 How Aggregations Work

Aggregations operate on the documents that match a query. Aggregations allow us to perform statistical operations on groups of documents. For instance, we can count the number of documents matching a specific criteria, calculate averages, find minimum and maximum values, or group data based on specific fields.

1.2.1 Basic Structure of an Elasticsearch Query with Aggregations

An Elasticsearch query with aggregations has the following structure:

{  "query": {    "match_all": {}  },  "aggs": {    "by_category": {      "terms": {        "field": "category.keyword"      },      "aggs": {        "average_price": {          "avg": {            "field": "price"          }        }      }    }  }}

In this query:

  • The match_all query matches all documents.
  • The terms aggregation by_category groups documents by the category field.
  • Within each category bucket, an avg aggregation average_price calculates the average price of products.

2. Project Setup

First, we set up a Spring Boot project with the necessary dependencies. In the pom.xml file, include the following dependencies:

        <dependency>            <groupId>org.springframework.boot</groupId>            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>        </dependency>        <dependency>            <groupId>org.springframework.boot</groupId>            <artifactId>spring-boot-starter-data-jpa</artifactId>        </dependency>        <dependency>            <groupId>org.springframework.boot</groupId>            <artifactId>spring-boot-starter-web</artifactId>        </dependency>        <dependency>            <groupId>org.elasticsearch.client</groupId>            <artifactId>elasticsearch-rest-high-level-client</artifactId>            <version>7.17.11</version>        </dependency>

2.1 Configure Elasticsearch

Next, we set up Elasticsearch by creating a configuration class.

@Configuration@EnableElasticsearchRepositoriespublic class ElasticsearchConfig {    @Bean    public RestClient restHighLevelClient() {        return RestClient.builder(                new HttpHost("localhost", 9200, "http"),                new HttpHost("localhost", 9201, "http"))                .build();    }    @Bean    public ElasticsearchClient elasticsearchClient(RestClient restClient) {        return ElasticsearchClients.createImperative(restClient);    }    @Bean(name = { "elasticsearchOperations", "elasticsearchTemplate" })    public ElasticsearchOperations elasticsearchOperations(ElasticsearchClient searchClient) {        ElasticsearchTemplate template = new ElasticsearchTemplate(searchClient);        template.setRefreshPolicy(RefreshPolicy.NONE);        return template;    }}
  • RestClient Bean: The RestClient is the main client that interacts with Elasticsearch. It is set up to connect to an Elasticsearch cluster running on localhost at ports 9200 and 9201. You can adjust the HttpHost values to match your specific Elasticsearch setup.
  • The ElasticsearchClient is a client provided by Elasticsearch. We create the ElasticsearchClient by wrapping the RestClient using the ElasticsearchClients.createImperative() method.
  • ElasticsearchOperations is an abstraction provided by Spring Data Elasticsearch, which simplifies the interaction with Elasticsearch. ElasticsearchTemplate is an implementation of ElasticsearchOperations, leveraging the ElasticsearchClient for its operations.

2.2 Create the Model

Next, let’s define the Product class. This class will represent the documents stored in our Elasticsearch index.

@Document(indexName = "product-items")public class Product {        @Id    private String id;        @Field(type = Keyword)    private String name;        @Field(type = Keyword)    private String category;        @Field(type = Keyword)    private float price;    public String getId() {        return id;    }    public void setId(String id) {        this.id = id;    }    public String getName() {        return name;    }    public void setName(String name) {        this.name = name;    }    public String getCategory() {        return category;    }    public void setCategory(String category) {        this.category = category;    }    public float getPrice() {        return price;    }    public void setPrice(float price) {        this.price = price;    }   }

This code snippet defines a Java class Product that is used to map documents in an Elasticsearch index named product-items using Spring Data Elasticsearch. The @Document annotation specifies that instances of the Product class will be indexed in Elasticsearch under the index name product-items.

Within the Product class, the @Field annotations are used to define the type of each field in the Elasticsearch index. In this case, all fields (name, category, price) are of type Keyword. The Keyword type is suitable for exact matches and aggregations, and is typically used for fields where the value should not be analyzed, such as categories.

2.3 Create the Repository

Next, create an interface to return the number of products grouped by category.

import org.springframework.data.domain.Pageable;import org.springframework.data.elasticsearch.core.SearchPage;public interface ProductItemRepository {    SearchPage<Product> getProductsCountByCategory(Pageable pageable);}

This interface declares a method that will return the number of products grouped by category, with results paginated according to the Pageable parameter.

Next, create a repository interface for Product:

@Repositorypublic interface ProductRepository extends ElasticsearchRepository<Product, String>, ProductItemRepository {}

Here, we are extending ElasticsearchRepository<Product, String> which is a Spring Data Elasticsearch interface that provides standard CRUD (Create, Read, Update, Delete) operations for the Product entity. By extending ProductItemRepository interface, the ProductRepository inherits the custom method to perform the query on our Product entity.

2.4 Add Aggregations

Now, let’s focus on adding aggregations. We will add a simple aggregation to count the number of products in each category. We will create a service class that uses ElasticsearchOperations to execute the aggregation queries.

@Servicepublic class ProductService implements ProductItemRepository {    @Autowired    private ElasticsearchOperations elasticsearchOperations;    @Override    public SearchPage<Product> getProductsCountByCategory(Pageable pageable) {        Query query = NativeQuery.builder()                .withAggregation("category_aggregation",                        Aggregation.of(b -> b.terms(t -> t.field("category.keyword"))))                .build();        SearchHits<Product> response = elasticsearchOperations.search(query, Product.class);        return SearchHitSupport.searchPageFor(response, pageable);    }        public Map<String, Long> countByCategory() {        Query query = NativeQuery.builder()                .withAggregation("category_aggregation",                        Aggregation.of(b -> b.terms(t -> t.field("category.keyword"))))                .build();        SearchHits<Product> searchHits = elasticsearchOperations.search(query, Product.class);        Map<String, Long> aggregatedData = ((ElasticsearchAggregations) searchHits                .getAggregations())                .get("category_aggregation")                .aggregation()                .getAggregate()                .sterms()                .buckets()                .array()                .stream()                .collect(Collectors.toMap(bucket -> bucket.key().stringValue(), MultiBucketBase::docCount));        return aggregatedData;    }}

Here, we use NativeQuery.builder() to build a query with a terms aggregation. In the getProductsCountByCategory(Pageable pageable) method, the aggregation, named "category_aggregation", counts the number of documents for each unique value in the category field. The query is executed using elasticsearchOperations.search(), and the result is converted to a SearchPage for pagination.

The countByCategory method executes a terms aggregation query that counts the number of documents in each category.

2.5 Expose the Aggregation Result via REST

Finally, expose the aggregation results via a REST controller:

@RestControllerpublic class ProductController {    @Autowired    private ProductService productService;    @GetMapping("/products-count-by-category")    public List<Product> getProductsCountByCategory(@RequestParam(defaultValue = "2") int size) {        SearchHits<Product> searchHits = productService.getProductsCountByCategory(Pageable.ofSize(size))                .getSearchHits();        return searchHits.getSearchHits()                .stream()                .map(SearchHit::getContent)                .collect(Collectors.toList());    }    @GetMapping("/count-by-category")    public Map<String, Long> countByCategory() {        return productService.countByCategory();    }

The ProductController class exposes two endpoints.

  • The /products-count-by-category endpoint takes size as a query parameter for pagination. It calls the getProductsCountByCategory() method from the ProductService to fetch the aggregated data.
  • The /count-by-category endpoint calls the countByCategory method and returns the counts as a JSON object.

3. Running and Testing the Application

With everything set up, we can run the application. The endpoint /products-count-by-category will return a list of Product objects, grouped by category with the number of documents (products) in each category.

We can test the /products-count-by-category endpoint using the following curl command:

curl -X GET "http://localhost:8080/products-count-by-category?size=2" -H "Content-Type: application/json"

We can test the new /count-by-category endpoint using the following curl command:

curl -X GET "http://localhost:8080/count-by-category" -H "Content-Type: application/json"

The response will be a JSON object where the keys are the category names and the values are the number of documents in each category. The screenshot below shows the output when viewed from a web browser.

This output shows that there are 2 products in the “Electronics” category and 1 in “Clothing.” The exact counts will depend on the data in your Elasticsearch index.

4. Conclusion

In this article, we’ve explored how to integrate and configure Elasticsearch in a Spring Boot application. We set up a basic configuration using a configuration class, which included defining essential beans like RestClient, ElasticsearchClient, and ElasticsearchOperations. These components are crucial for interacting with Elasticsearch leveraging advanced features like aggregations.

5. Download the Source Code

This was an article on how to perform an Elasticsearch aggregation query.