前言

中秋即将来临,预祝我们中秋快乐!!!中秋是我国的传统节日,是一个与家人团聚、赏月、品味美食的重要时刻。在这个特殊的节日中,有许多元素都与中秋休戚相关,像玉兔,月亮,蟾蜍,吃螃蟹,月饼等,为了能够准确地找出语义相关的图片,本文教我们建立一个简易的文搜图模型。

该模型地首要思路是将图画及其文本内容的表明投影到相同的嵌入空间中,使得文本嵌入在所描绘的图画的嵌入邻近,最后经过核算向量类似度返回 topk 个图片即可。

demo很粗陋。。。

基础

  • java
  • milvus
  • 能够向量化的工具

干货

1、文件上传向量化,简略粗犷直接存本地,由于运用openAi来进行向量化,而openAi提供的embeddings模型只能向量化文本,所以采取给图片添加描绘的方法(能够选用能向量化图片、视频的模型,思路是通用的)

@PostMapping(value = "/local", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public Dict local(@RequestParam("file") MultipartFile file) {
    if (file.isEmpty()) {
        return Dict.create().set("code", 400).set("message", "文件内容为空");
    }
    String fileName = file.getOriginalFilename();
    //文件名
    String rawFileName = StrUtil.subBefore(fileName, ".", true);
    String fileType = StrUtil.subAfter(fileName, ".", true);
    //文件全路径
    String localFilePath = StrUtil.appendIfMissing(fileTempPath, "/") + rawFileName + "-" + DateUtil.current(false) + "." + fileType;
    try {
        file.transferTo(new File(localFilePath));
        List<Float> embeddingList = getFloats(rawFileName);
        Map<String, List<?>> value = new HashMap<>();
        value.put("id", Arrays.asList(UUID.randomUUID().toString()));
        value.put("local_file_path", Arrays.asList(fileName));
        value.put("text_feature", Arrays.asList(embeddingList));
        milvusOperateService.insert("image_test", value);
    } catch (IOException e) {
        log.error("【文件上传至本地】失利,绝对路径:{}", localFilePath);
        return Dict.create().set("code", 500).set("message", "文件上传失利");
    }catch (Exception e) {
        log.error("【向量化失利】失利,绝对路径:{}", localFilePath);
        return Dict.create().set("code", 500).set("message", "向量化失利");
    }
    log.info("【文件上传至本地】绝对路径:{}", localFilePath);
    return Dict.create().set("code", 200).set("message", "上传成功").set("data", Dict.create().set("fileName", fileName).set("filePath", localFilePath));
}

2、经过输入的文本,查询出含义最相近的图片

@GetMapping(value = "/findFeature")
public Dict findFeature(@RequestParam("fileDesc") String fileDesc) {
    List<String> images =new ArrayList<>();
    try {
        List<Float> embeddingList = getFloats(fileDesc);
        Map<String, List> searchByFeature = milvusOperateService.searchByFeature("image_test", 2, "text_feature",
            "{"ef":10}","", Arrays.asList("local_file_path"), Arrays.asList(embeddingList));
        for(Object o:searchByFeature.get("local_file_path")){
            images.add((String) o);
        }
    } catch (Exception e) {
        log.error("【向量查找】失利,text:{}", fileDesc);
        return Dict.create().set("code", 500).set("message", "向量查找失利");
    }
    return Dict.create().set("code", 200).set("message", "success").set("data",images );
}

效果展示

输入一些和中秋元素有关的文本描绘(需求事先上传相关的图片以及描绘,这里我偷闲了,直接把描绘写在文件名上了),前端结构很简略,比较粗犷。。。

中秋-向量数据库实现文搜图,搜索出你心目中的嫦娥

中秋-向量数据库实现文搜图,搜索出你心目中的嫦娥

中秋-向量数据库实现文搜图,搜索出你心目中的嫦娥
中秋-向量数据库实现文搜图,搜索出你心目中的嫦娥

向量工具类

/**
 * milvus操作,不考虑分区
 * milvus删去操作后的数据会被符号、主动紧缩,超过Time Travel保存的时刻被清除
 */
@Component
@Slf4j
public class MilvusOperateService {
    // 办理链接目标的池子
    private GenericObjectPool<MilvusServiceClient> milvusServiceClientGenericObjectPool;
    private MilvusOperateService() {
        // 私有构造方法创立一个目标池工厂
        MilvusPoolFactory milvusPoolFactory = new MilvusPoolFactory();
        // 目标池装备 (暂时运用默许的就行了)
        GenericObjectPoolConfig objectPoolConfig = new GenericObjectPoolConfig();
//        int cpu = Runtime.getRuntime().availableProcessors();
//        int minIdle = cpu * 6;
//        objectPoolConfig.setMinIdle(minIdle);
//        objectPoolConfig.setMaxIdle(minIdle * 8);
//        objectPoolConfig.setMaxTotal(minIdle * 16);
        //删去抛弃目标的装备设置
        AbandonedConfig abandonedConfig = new AbandonedConfig();
        //在Maintenance的时分查看是否有走漏
        abandonedConfig.setRemoveAbandonedOnMaintenance(true);
        //borrow 的时分查看走漏
        abandonedConfig.setRemoveAbandonedOnBorrow(true);
        //假如一个目标borrow之后20秒还没有返还给pool,认为是走漏的目标
        abandonedConfig.setRemoveAbandonedTimeout(20);
        // 目标池
        milvusServiceClientGenericObjectPool = new GenericObjectPool(milvusPoolFactory, objectPoolConfig);
        milvusServiceClientGenericObjectPool.setAbandonedConfig(abandonedConfig);
        milvusServiceClientGenericObjectPool.setTimeBetweenEvictionRunsMillis(5000); //5秒运转一次保护任务
        log.info("MilvusOperateService-目标池创立成功");
    }
    /**
     * 创立一个Collection 类似于创立联系型数据库中的一张表
     *
     * @param collection     调集称号
     * @param collectionDesc 调集描绘
     * @param fieldTypes     建表字段
     * @return
     */
    public Boolean createCollection(String collection, String collectionDesc, List<FieldType> fieldTypes) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            CreateCollectionParam.Builder builder = CreateCollectionParam.newBuilder()
                    .withCollectionName(collection)
                    .withDescription(collectionDesc);
            for (FieldType fieldType : fieldTypes) {
                builder.addFieldType(fieldType);
            }
            CreateCollectionParam createCollectionReq = builder.build();
            R<RpcStatus> result = milvusServiceClient.createCollection(createCollectionReq);
            log.info("MilvusOperateService-创立调集成果" + result.getStatus() + " [0为成功]");
            if (result.getStatus().intValue() == 0) {
                return true;
            }
            return false;
        } catch (Exception e) {
            e.printStackTrace();
            log.info("MilvusOperateService-创立调集成果 失利,err:{}", e.getMessage());
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 把调集加载到内存中(milvus查询前必须把数据加载到内存中)
     *
     * @param collection
     */
    public void loadingLocation(String collection) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            R<RpcStatus> rpcStatusR = milvusServiceClient.loadCollection(
                    LoadCollectionParam.newBuilder()
                            .withCollectionName(collection)
                            .build());
            log.info("MilvusOperateService-加载调集成果" + rpcStatusR + " [0为成功]");
        } catch (Exception e) {
            log.info("MilvusOperateService-加载调集成果 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 查看调集概况
     *
     * @param collection
     */
    public GetCollectionStatisticsResponse getCollectionInfo(String collection) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            R<GetCollectionStatisticsResponse> collectionStatistics = milvusServiceClient.getCollectionStatistics(
                    GetCollectionStatisticsParam.newBuilder()
                            .withCollectionName(collection)
                            .build());
            GetCollStatResponseWrapper wrapperCollectionStatistics = new GetCollStatResponseWrapper(collectionStatistics.getData());
            log.info("Collection row count: " + wrapperCollectionStatistics.getRowCount());
            return collectionStatistics.getData();
        } catch (Exception e) {
            log.info("MilvusOperateService-查看调集概况 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 列出所有调集
     */
    public ShowCollectionsResponse getCollectionList() throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            R<ShowCollectionsResponse> collections = milvusServiceClient.showCollections(ShowCollectionsParam.newBuilder().build());
            return collections.getData();
        } catch (Exception e) {
            log.info("MilvusOperateService-列出所有调集 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 查看调集是否存在
     *
     * @param collection
     */
    public Boolean hasCollection(String collection) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            R<Boolean> hasCollection = milvusServiceClient.hasCollection(
                    HasCollectionParam.newBuilder()
                            .withCollectionName(collection)
                            .build());
            return hasCollection.getData();
        } catch (Exception e) {
            log.info("MilvusOperateService-查看调集是否存在 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 在查找或查询后从内存中开释调集以减少内存运用
     *
     * @param collection
     */
    public void freedLoaction(String collection) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            R<RpcStatus> rpcStatusR = milvusServiceClient.releaseCollection(
                    ReleaseCollectionParam.newBuilder()
                            .withCollectionName(collection)
                            .build());
            log.info("MilvusOperateService-开释调集成果" + rpcStatusR + " [0为成功]");
        } catch (Exception e) {
            log.info("MilvusOperateService-开释调集成果 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 删去一个调集(符号删去)
     *
     *
     * @param collection
     */
    private void delCollection(String collection) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            R<RpcStatus> rpcStatusR = milvusServiceClient.dropCollection(
                    DropCollectionParam.newBuilder()
                            .withCollectionName(collection)
                            .build());
            log.info("MilvusOperateService-删去调集成果" + rpcStatusR.getStatus() + " [0为成功]");
        } catch (Exception e) {
            log.info("MilvusOperateService-删去调集成果 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    public String insert(String collectionName, Map<String, List<?>> values) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            List<InsertParam.Field> fields = new ArrayList<>();
            for (String code : values.keySet()) {
                fields.add(new InsertParam.Field(code, values.get(code)));
            }
            InsertParam insertParam = InsertParam.newBuilder()
                    .withCollectionName(collectionName)
                    .withFields(fields)
                    .build();
            R<MutationResult> insertResult = milvusServiceClient.insert(insertParam);
            if (insertResult.getStatus() == 0) {
                return insertResult.getData().getIDs().getStrId().getData(0);
            } else {
                log.info("MilvusOperateService-刺进数据成果 失利,err:{}", insertResult.getMessage());
                throw new RuntimeException(insertResult.getMessage());
            }
//            milvusServiceClient.flush()
        } catch (Exception e) {
            log.info("MilvusOperateService-刺进数据成果 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 刷新数据
     *
     * @param collectionNames
     * @return
     */
    public void flush(List<String> collectionNames) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            FlushParam flushParam = FlushParam.newBuilder()
                    .withCollectionNames(collectionNames)
                    .build();
            R<FlushResponse> responseR = milvusServiceClient.flush(flushParam);
            if (responseR.getStatus() != 0) {
                log.info("MilvusOperateService-flush成果 失利,err:{}", responseR.getMessage());
                throw new RuntimeException(responseR.getMessage());
            }
        } catch (Exception e) {
            log.info("MilvusOperateService-flush成果 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 删去数据
     *
     * @param collectionName 调集名
     * @param deleteExpr     布尔表达式
     * @return
     */
    public void delete(String collectionName, String deleteExpr) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
            DeleteParam deleteParam = DeleteParam.newBuilder()
                    .withCollectionName(collectionName)
                    .withExpr(deleteExpr)
                    .build();
            R<MutationResult> deleteResult = milvusServiceClient.delete(deleteParam);
            if (deleteResult.getStatus() != 0) {
                log.info("MilvusOperateService-删去数据成果 失利,err:{}", deleteResult.getMessage());
                throw new RuntimeException(deleteResult.getMessage());
            }
        } catch (Exception e) {
            log.info("MilvusOperateService-删去数据成果 失利,Exception:{}", e.getMessage());
            e.printStackTrace();
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    /**
     * 依据向量查找数据
     *
     * @param collection         调集称号
     * @param topK               查询多少条类似成果
     * @param VectorFieldName    查询的字段
     * @param params             每种索引参数不同
     * @param searchOutputFields 返回的字段
     * @param searchVectors      用于查找的向量
     * @return
     */
    public Map<String, List> searchByFeature(String collection, int topK, String VectorFieldName, String params,String expr,
                                             List<String> searchOutputFields, List<?> searchVectors) throws Exception {
        MilvusServiceClient milvusServiceClient = null;
        try {
            // 经过目标池办理目标
            milvusServiceClient = milvusServiceClientGenericObjectPool.borrowObject();
//            List<String> searchOutputFields = Arrays.asList("user_code", "user_name", "user_code");
            SearchParam.Builder builder = SearchParam.newBuilder();
            builder.withCollectionName(collection)
                    .withMetricType(MetricType.L2)
                    .withOutFields(searchOutputFields)
                    .withTopK(topK)
                    .withVectors(searchVectors)
                    .withVectorFieldName(VectorFieldName)
//                    .withParams("{"nprobe":10}")
                    .withParams(params);
            if(!StringUtils.isNotBlank(expr)){
                builder.withExpr(expr);
            }
            SearchParam searchParam = builder.build();
            R<SearchResults> respSearch = milvusServiceClient.search(searchParam);
            if (respSearch.getStatus() == 0) {
                SearchResultsWrapper wrapperSearch = new SearchResultsWrapper(respSearch.getData().getResults());
                Map<String, List> map = new HashMap();
                for (String name : searchOutputFields) {
                    map.put(name, wrapperSearch.getFieldData(name, 0));
                }
                return map;
            } else {
                log.info("MilvusOperateService-依据向量查找数据 失利,err:{}", respSearch.getMessage());
                throw new RuntimeException(respSearch.getMessage());
            }
        } catch (Exception e) {
            e.printStackTrace();
            log.info("MilvusOperateService-依据向量查找数据 失利,Exception:{}", e.getMessage());
            throw e;
        } finally {
            // 收回目标到目标池
            if (milvusServiceClient != null) {
                milvusServiceClientGenericObjectPool.returnObject(milvusServiceClient);
            }
        }
    }
    public static void main(String[] args) throws Exception {
        Random ran = new Random();
        List<Long> book_id_array = new ArrayList<>();
        List<Long> word_count_array = new ArrayList<>();
        List<List<Float>> book_intro_array = new ArrayList<>();
        for (long i = 0L; i < 2; ++i) {
            book_id_array.add(i);
            word_count_array.add(i + 10000);
            List<Float> vector = new ArrayList<>();
            for (int k = 0; k < 1536; ++k) {
                vector.add(ran.nextFloat());
            }
            book_intro_array.add(vector);
        }
        System.out.println(book_intro_array);
    }
}
public class MilvusPoolFactory extends BasePooledObjectFactory<MilvusServiceClient> {
    @Override
    public MilvusServiceClient create() throws Exception {
        ConnectParam connectParam = ConnectParam.newBuilder()
                .withHost("192.168.1.68")
                .withPort(19530)
                .build();
        return new MilvusServiceClient(connectParam);
    }
    @Override
    public PooledObject<MilvusServiceClient> wrap(MilvusServiceClient milvusServiceClient) {
        return new DefaultPooledObject<>(milvusServiceClient);
    }
}
<dependency>
  <groupId>io.milvus</groupId>
  <artifactId>milvus-sdk-java</artifactId>
  <exclusions>
    <exclusion>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-slf4j-impl</artifactId>
    </exclusion>
  </exclusions>
  <version>2.2.2</version>
</dependency>

参阅

  • milvus
  • openai#embeddings