什么是Milvus?

  • Milvus,一个开源的高功用向量数据库,它在各种使用场景中展现出强大的功用和灵活性。
    在许多现代使用中,处理和剖析大规模向量数据变得越来越重要。例如,在图画和视频查找、推荐体系、自然语言处理和生物信息学等范畴,向量数据被广泛使用。

项目布景

  • 在公司推荐体系中,咱们需要根据用户的前史行为和爱好,为其推荐相关的内容。所以将用户和内容表明为向量,并使用 Milvus 进行类似度匹配。通过将用户向量和内容向量存储在 Milvus 中,并使用其高效的类似度查询功用,咱们可以快速找到与用户爱好最匹配的内容,并进行个性化推荐。

  • 向量的生成由spark任务生成数据并写入,本文只写SpringBoot集成Milvus实现数据查询部分,面向C端,功用已测

Maven依靠引入

  • 开始使用的是1.x版别,后来因为2.x新增了过滤挑选功用,晋级了版别为2.2.3,1版别和2版别查询还是有一些区别,建议选用2版别
<dependency>
    <groupId>io.milvus</groupId>
    <artifactId>milvus-sdk-java</artifactId>
    <version>2.2.3</version>
</dependency>

自动装备

@Configuration
public class MilvusConfiguration {
    /**
     *  milvus ip addr
     */
    @Value("${milvus.config.ipAddr}")
    private String ipAddr;
    /**
     * milvus   port
     */
    @Value("${milvus.config.port}")
    private Integer  port;
    @Bean
    @Scope("singleton")
    public MilvusServiceClient getMilvusClient() {
        return getMilvusFactory().getMilvusClient();
    }
    @Bean(initMethod = "init", destroyMethod = "close")
    public MilvusRestClientFactory getMilvusFactory() {
        return  MilvusRestClientFactory.build(ipAddr, port);
    }
}

milvus Rest client 封装

public class MilvusRestClientFactory {
    private static String  IP_ADDR;
    private static Integer PORT ;
    private MilvusServiceClient milvusServiceClient;
    private ConnectParam.Builder  connectParamBuilder;
    private static MilvusRestClientFactory milvusRestClientFactory = new MilvusRestClientFactory();
    private MilvusRestClientFactory(){
    }
    public static MilvusRestClientFactory build(String ipAddr, Integer  port) {
        IP_ADDR = ipAddr;
        PORT = port;
        return milvusRestClientFactory;
    }
    private ConnectParam.Builder connectParamBuilder(String host, int port) {
        return  ConnectParam.newBuilder().withHost(host).withPort(port);
    }
    public void init() {
        connectParamBuilder =  connectParamBuilder(IP_ADDR,PORT);
        ConnectParam connectParam = connectParamBuilder.build();
        milvusServiceClient =new MilvusServiceClient(connectParam);
    }
    public MilvusServiceClient getMilvusClient() {
        return milvusServiceClient;
    }
    public void close() {
        if (milvusServiceClient != null) {
            try {
                milvusServiceClient.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

查询

写入数据不同,获取成果不同,我这里最终获取的是Long类型的数据调集,仅供参考

  • 同步查找milvus
/**
 * 同步查找milvus
 * @param collectionName 表名
 * @param vectors 查询向量
 * @param topK 最类似的向量个数
 * @return
 */
public List<Long> search(String collectionName, List<List<Float>> vectors, Integer topK) {
    Assert.notNull(collectionName, "collectionName  is null");
    Assert.notNull(vectors, "vectors is null");
    Assert.notEmpty(vectors, "vectors is empty");
    Assert.notNull(topK, "topK is null");
    int nprobeVectorSize = vectors.get(0).size();
    String paramsInJson = "{"nprobe": " + nprobeVectorSize + "}";
    SearchParam searchParam =
            SearchParam.newBuilder().withCollectionName(collectionName)
                    .withParams(paramsInJson)
                    .withMetricType(MetricType.IP)
                    .withVectors(vectors)
                    .withVectorFieldName("embedding")
                    .withTopK(topK)
                    .build();
    R<SearchResults> searchResultsR = milvusServiceClient.search(searchParam);
    SearchResults searchResultsRData = searchResultsR.getData();
    List<Long> topksList = searchResultsRData.getResults().getIds().getIntId().getDataList();
    return topksList;
}
  • 同步查找milvus,添加过滤条件查找
/**
 * 同步查找milvus,添加过滤条件查找
 *
 * @param collectionName 表名
 * @param vectors 查询向量
 * @param topK 最类似的向量个数
 * @param exp 过滤条件:status=1
 * @return
 */
public List<Long> search(String collectionName, List<List<Float>> vectors, Integer topK, String exp) {
    Assert.notNull(collectionName, "collectionName  is null");
    Assert.notNull(vectors, "vectors is null");
    Assert.notEmpty(vectors, "vectors is empty");
    Assert.notNull(topK, "topK is null");
    Assert.notNull(exp, "exp is null");
    int nprobeVectorSize = vectors.get(0).size();
    String paramsInJson = "{"nprobe": " + nprobeVectorSize + "}";
    SearchParam searchParam =
            SearchParam.newBuilder().withCollectionName(collectionName)
                    .withParams(paramsInJson)
                    .withMetricType(MetricType.IP)
                    .withVectors(vectors)
                    .withExpr(exp)
                    .withVectorFieldName("embedding")
                    .withTopK(topK)
                    .build();
    R<SearchResults> searchResultsR = milvusServiceClient.search(searchParam);
    SearchResults searchResultsRData = searchResultsR.getData();
    List<Long> topksList = searchResultsRData.getResults().getIds().getIntId().getDataList();
    return topksList;
}
  • 异步查找milvus:针对实时成果要求不高的场景
/**
 * 异步查找milvus
 *
 * @param collectionName 表名
 * @param vectors 查询向量
 * @param partitionList 最类似的向量个数
 * @param topK
 * @return
 */
public List<Long> searchAsync(String collectionName, List<List<Float>> vectors,
                              List<String> partitionList, Integer topK) throws ExecutionException, InterruptedException {
    Assert.notNull(collectionName, "collectionName  is null");
    Assert.notNull(vectors, "vectors is null");
    Assert.notEmpty(vectors, "vectors is empty");
    Assert.notNull(partitionList, "partitionList is null");
    Assert.notEmpty(partitionList, "partitionList is empty");
    Assert.notNull(topK, "topK is null");
    int nprobeVectorSize = vectors.get(0).size();
    String paramsInJson = "{"nprobe": " + nprobeVectorSize + "}";
    SearchParam searchParam =
            SearchParam.newBuilder().withCollectionName(collectionName)
                    .withParams(paramsInJson)
                    .withVectors(vectors)
                    .withTopK(topK)
                    .withPartitionNames(partitionList)
                    .build();
    ListenableFuture<R<SearchResults>> listenableFuture = milvusServiceClient.searchAsync(searchParam);
    List<Long> resultIdsList = listenableFuture.get().getData().getResults().getTopksList();
    return resultIdsList;
}
  • 获取分区调集
/**
 * 获取分区调集
 * @param collectionName 表名
 * @return
 */
public List<String> getPartitionsList(String collectionName) {
    Assert.notNull(collectionName, "collectionName  is null");
    ShowPartitionsParam searchParam = ShowPartitionsParam.newBuilder().withCollectionName(collectionName).build();
    List<ByteString> byteStrings = milvusServiceClient.showPartitions(searchParam).getData().getPartitionNamesList().asByteStringList();
    List<String> partitionList = Lists.newLinkedList();
    byteStrings.forEach(s -> {
        partitionList.add(s.toStringUtf8());
    });
    return partitionList;
}

yml装备数据

milvus:
  config:
    ipAddr: xxx.xxx.xxx.xxx
    port: 19531