Modelscope Agent 实操（六）：添加涂鸦作画能力到 Modelscope-Agent 中_AI阅读总结

包阅导读总结

1. 关键词：Modelscope Agent、涂鸦作画、异步接口、api封装、工具注册

2. 总结：本文主要介绍了将涂鸦作画的异步api接口封装成tool并添加到Modelscope-Agent中的过程，包括环境准备、请求生成与状态查询的方法、核心链路代码及测试用例等，以实现通过图片和文本生成丰富图片的功能。

3. 主要内容：

– 环境准备

– 参考可选的图链接

– 熟悉Modelscope-Agent代码框架

– 异步请求生成图片

– 发送包含提示和草图图片链接等数据的请求

– 获取包含任务状态和任务ID的返回结果

– 状态查询

– 发送GET请求获取任务状态和结果等信息

– 注册新工具

– 定义SketchToImage类，包括描述、名称、参数等

– 实现解析入参和生成图片url的方法

– 调用异步请求生成接口和同步请求结果轮训接口

– 测试用例

– 测试SketchToImage类的功能是否完善

– 测试在角色场景中的应用

– 额外添加

– 在modelscope_agent/tools/base.py中注册

– 在modelscope_agent/tools/__init__.py中添加类引用路径懒加载

思维导图：

文章地址：https://mp.weixin.qq.com/s/fbhqFdN29ZLW1Xyhd83bbQ

文章来源：mp.weixin.qq.com

作者：魔搭ModelScope社区

发布时间：2024/8/1 11:19

语言：中文

总字数：2252字

预计阅读时间：10分钟

评分：78分

标签：Modelscope-Agent,涂鸦作画,AI工具集成,API封装,AI应用开发

以下为原文内容

本内容来源于用户推荐转载，旨在分享知识与观点，如有侵权请联系删除联系邮箱 media@ilingban.com

在本文中，我们将展示如何将一个包含异步的api接口封装成tool并被agent在chat过程中调用执行的过程，具体包括如下几步：

本文添加的api接口即为涂鸦作画

涂鸦作画能力

这是一种图生图能力，一张草图搭配一段描述生成内容丰富的图片能力。

举例说明譬如：

选用相关接口采用 dashscope的涂鸦作画接口（https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-wanxiang-api-for-doodle?spm=a2c4g.11186623.0.0.4e534393H1eB3n）

环境准备

环境：

其他：

可选参考图：http://synthesis-source.oss-accelerate.aliyuncs.com/lingji/datasets/QuickDraw_sketches_final/cat/4503626191994880.png

熟悉Agent

熟悉Modelscope-Agent代码框架：

https://github.com/modelscope/modelscope-agent

异步请求生成图片

curl --location 'https:--header 'X-DashScope-Async: enable' \--header 'Authorization: Bearer <your dashscope api token>' \--header 'Content-Type: application/json' \--header 'X-DashScope-OssResourceResolve: enable' \--data '{  "input": {    "prompt": "绿色的猫",    "sketch_image_url": "http://synthesis-source.oss-accelerate.aliyuncs.com/lingji/datasets/QuickDraw_sketches_final/cat/4503626191994880.png"  },  "model": "wanx-sketch-to-image-lite"}'

{"output":{"task_status":"PENDING","task_id":"76a71d5b-8fc5-4d47-8ef8-c16af80951f3"},"request_id":"1ad6a3f4-8a80-9118-b805-4515376a9404"}

状态查询

curl -X GET \--header 'Authorization: Bearer <your dashscope api token>' \https://dashscope.aliyuncs.com/api/v1/tasks/76a71d5b-8fc5-4d47-8ef8-c16af80951f3

{"request_id":"5441c445-ec10-963e-9c74-8907e507d1e2","output":{"task_id":"76a71d5b-8fc5-4d47-8ef8-c16af80951f3","task_status":"SUCCEEDED","submit_time":"2024-07-0223:07:03.292","scheduled_time":"2024-07-0223:07:03.317","end_time":"2024-07-0223:07:15.401","results":[{"url":"https://dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com/1d/db/20240702/96f6710c/0b3c9685-1683-4843-87b6-f0ce9bfe8972-1.png?Expires=1720019235&OSSAccessKeyId=LTAI5tQZd8AEcZX6KZV4G8qL&Signature=kdVTIwCb9OTr6V0vTRnnqWqpt4Q%3D"}],"task_metrics":{"TOTAL":1,"SUCCEEDED":1,"FAILED":0}},"usage":{"image_count":1}}

注册新工具链路解读:

register_tool：用于框架层面注册tool，并唯一标识名字
description，name 以及parameters对齐 openai的tool calling格式，方便tool args的生成
call 具体执行 tool的入口function

import osimport time
import jsonimport requestsfrom modelscope_agent.constants import BASE64_FILES, LOCAL_FILE_PATHS, ApiNamesfrom modelscope_agent.tools.base import BaseTool, register_toolfrom modelscope_agent.utils.utils import get_api_key, get_upload_urlfrom requests.exceptions import RequestException, Timeout
MAX_RETRY_TIMES = 3WORK_DIR = os.getenv('CODE_INTERPRETER_WORK_DIR', '/tmp/ci_workspace')
@register_tool('sketch_to_image')class SketchToImage(BaseTool):    description = '调用sketch_to_image api通过图片加文本生成图片'    name = 'sketch_to_image'    parameters: list = [{        'name': 'input.sketch_image_url',        'description': '用户上传的照片的相对路径',        'required': True,        'type': 'string'    }, {        'name': 'input.prompt',        'description': '详细描述了希望生成的图像具有什么样的特点',        'required': True,        'type': 'string'    }]
    def call(self, params: str, **kwargs) -> str:        pass

添加涂鸦作画能力核心链路到tool：

解析入参，生成图片url

    def _parse_input(self, *args, **kwargs):        kwargs = super()._parse_files_input(*args, **kwargs)
        restored_dict = {}        for key, value in kwargs.items():            if '.' in key:                                keys = key.split('.')                temp_dict = restored_dict                for k in keys[:-1]:                    temp_dict = temp_dict.setdefault(k, {})                temp_dict[keys[-1]] = value            else:                                restored_dict[key] = value        kwargs = restored_dict        image_path = kwargs['input'].pop('sketch_image_url', None)        if image_path and image_path.endswith(('.jpeg', '.png', '.jpg')):                                    if LOCAL_FILE_PATHS not in kwargs:                image_path = f'file://{os.path.join(WORK_DIR,image_path)}'            else:                image_path = f'file://{kwargs["local_file_paths"][image_path]}'            image_url = get_upload_url(                model=                'wanx-sketch-to-image-lite',                file_to_upload=image_path,                api_key=os.environ.get('DASHSCOPE_API_KEY', ''))            kwargs['input']['sketch_image_url'] = image_url        else:            raise ValueError('请先上传一张正确格式的图片')        kwargs['model'] = 'wanx-sketch-to-image-lite'        print('草图生图的tool参数：', kwargs)        return kwargs

调用异步请求生成接口

  def call(self, params: str, **kwargs) -> str:        params = self._verify_args(params)        if isinstance(params, str):            return 'Parameter Error'        if BASE64_FILES in kwargs:            params[BASE64_FILES] = kwargs[BASE64_FILES]        remote_parsed_input = self._parse_input(**params)        remote_parsed_input = json.dumps(remote_parsed_input)        url = kwargs.get(            'url',            'https://dashscope.aliyuncs.com/api/v1/services/aigc/image2image/image-synthesis'        )        try:            self.token = get_api_key(ApiNames.dashscope_api_key, **kwargs)        except AssertionError:            raise ValueError('Please set valid DASHSCOPE_API_KEY!')
        retry_times = MAX_RETRY_TIMES        headers = {            'Content-Type': 'application/json',            'Authorization': f'Bearer {self.token}',            'X-DashScope-Async': 'enable'        }                headers['X-DashScope-OssResourceResolve'] = 'enable'        while retry_times:            retry_times -= 1            try:
                response = requests.request(                    'POST', url=url, headers=headers, data=remote_parsed_input)
                if response.status_code != requests.codes.ok:                    response.raise_for_status()                origin_result = json.loads(response.content.decode('utf-8'))
                self.final_result = origin_result                return self._get_dashscope_image_result()            except Timeout:                continue            except RequestException as e:                raise ValueError(                    f'Remote call failed with error code: {e.response.status_code},\                    error message: {e.response.content.decode("utf-8")}')
        raise ValueError(            'Remote call max retry times exceeded! Please try to use local call.'        )

调用同步请求结果轮训接口

    def _get_dashscope_result(self):        if 'task_id' in self.final_result['output']:            task_id = self.final_result['output']['task_id']        get_url = f'https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}'        get_header = {'Authorization': f'Bearer {self.token}'}
        retry_times = MAX_RETRY_TIMES        while retry_times:            retry_times -= 1            try:                response = requests.request(                    'GET', url=get_url, headers=get_header)                if response.status_code != requests.codes.ok:                    response.raise_for_status()                origin_result = json.loads(response.content.decode('utf-8'))
                get_result = origin_result                return get_result            except Timeout:                continue            except RequestException as e:                raise ValueError(                    f'Remote call failed with error code: {e.response.status_code},\                    error message: {e.response.content.decode("utf-8")}')
        raise ValueError(            'Remote call max retry times exceeded! Please try to use local call.'        )
    def _get_dashscope_image_result(self):        try:            result = self._get_dashscope_result()            while True:                result_data = result                output = result_data.get('output', {})                task_status = output.get('task_status', '')
                if task_status == 'SUCCEEDED':                    print('任务已完成')                                        output_url = result['output']['results'][0]['url']                    return f'![IMAGEGEN]({output_url})'
                elif task_status == 'FAILED':                    raise Exception(output.get('message', '任务失败，请重试'))                else:                                        time.sleep(0.5)                      result = self._get_dashscope_result()                    print(f'Running:{result}')
        except Exception as e:            raise Exception('get Remote Error:', str(e))

测试用例确保功能完善

import os
import pytestfrom modelscope_agent.tools.dashscope_tools.sketch_to_image import SketchToImage
from modelscope_agent.agents.role_play import RolePlay  
IS_FORKED_PR = os.getenv('IS_FORKED_PR', 'false') == 'true'

@pytest.mark.skipif(IS_FORKED_PR, reason='only run modelscope-agent main repo')def test_sketch_to_image():        params = """{'input.sketch_image_url': 'sketch.png', 'input.prompt': '绿色的猫'}"""
    style_repaint = SketchToImage()    res = style_repaint.call(params)    assert (res.startswith('![IMAGEGEN](http'))

@pytest.mark.skipif(IS_FORKED_PR, reason='only run modelscope-agent main repo')def test_sketch_to_image_role():    role_template = '你扮演一个绘画家，用尽可能丰富的描述调用工具绘制各种风格的图画。'
    llm_config = {'model': 'qwen-max', 'model_server': 'dashscope'}
        function_list = ['sketch_to_image']
    bot = RolePlay(        function_list=function_list, llm=llm_config, instruction=role_template)
    response = bot.run('[上传文件 "sketch.png"],我想要一只绿色耳朵带耳环的猫')    text = ''    for chunk in response:        text += chunk    print(text)    assert isinstance(text, str)

额外添加到modelscope_agent/tools/base.py 中注册

register_map = {    'sketch_to_image':    'SketchToImage',    'amap_weather':    'AMAPWeather',    'storage':    'Storage',    'web_search':    'WebSearch',    'image_gen':    'TextToImageTool',    'image_gen_lite':    'TextToImageLiteTool',

类引用路径添加懒加载：modelscope_agent/tools/__init__.py

import sys
from ..utils import _LazyModulefrom .contrib import *  
_import_structure = {    'amap_weather': ['AMAPWeather'],    'code_interpreter': ['CodeInterpreter'],    'contrib': ['AliyunRenewInstanceTool'],    'dashscope_tools': [        'ImageEnhancement', 'TextToImageTool', 'TextToImageLiteTool',        'ParaformerAsrTool', 'QWenVL', 'SambertTtsTool', 'StyleRepaint',        'WordArtTexture', 'SketchToImage'    ],

提交代码pull request到代码仓库

Agentfabric能力介绍：

应用新增涂鸦作画能力：

在代码中修改tool_config.json，确保新tool能够被调用：apps/agentfabric/config/tool_config.json

在agentfabric中添加并使用新生成的tool

agent的可以通过外部tool，将LLM能力快速扩展到更广泛的应用场景
tool的正确调用依赖LLM的指令理解和指令生成，因此需要正确的定义指令
tool的调用在agent内部是同步的，因此对于一些调用异步接口需要等待的场景可以有多种办法，一种是在单步调用tool的时候等待结果产生并抛出结果，还有就是通过多个tool来让agent帮助你去做结果的查询。

点击阅读原文，直达开源地址，欢迎star~

分类

Modelscope Agent 实操（六）：添加涂鸦作画能力到 Modelscope-Agent 中_AI阅读总结 — 包阅AI

以下为原文内容

涂鸦作画能力

环境准备

应用新增涂鸦作画能力：