一、用ChatGPT直接生成的测验用例

要写测验,咱们要先有一个程序。为了防止这个题目自身就在AI的练习数据集里边,它直接知道答案。

咱们用一个有意思的小题目,也便是让Python依据咱们输入的一个整数代表的天数,格局化成一段天然言语描述的时刻。条件界说:1个星期是7天,1个月是30天,1年是365天。比方,输入1就回来1d,输入8就回来1w1d,输入32就回来1m2d,输入375就回来1y1w3d。

需求:

用Python写一个函数,进行时刻格局化输出,条件界说为1个星期是7天,1个月是30天,1年是365天。比方:
输入  输出
1     1d
8     1w1d
61    2m1d
375   1y1w3d
要求仅需求格局化到年(?y?m?w?d),即可

咱们直接让ChatGPT把程序写好如下:

AI写测试用例

已然ChatGPT能够写代码,天然也能够让它帮咱们把单元测验也写好,如下:

AI写测试用例

AI写测试用例

这个测验用例掩盖的场景其完结已很全面了,既包括了根本的功用验证测验用例,也包括了一些异常的测验用例。

二、依据Openai接口进行进程验证

2.1、分化过程写Prompts

OpenAI的示例给出了很好的思路,那便是把问题拆分红多个过程。

  • 把代码交给大言语模型,让大言语模型解说一下,这个代码是在干什么。
  • 把代码和代码的解说一起交给大言语模型,让大言语模型规划一下,针对这个代码逻辑,咱们究竟要写哪几个TestCase。假如数量太少,能够重复让AI多生成几个TestCase。
  • 针对TestCase的详细描述,再提交给大言语模型,让它依据这些描述生成详细的测验代码。对于生成的代码,咱们还要进行一次语法查看,假如语法查看都无法经过,咱们就让AI重新再生成一下。

2.2、请AI解说要测验的代码

import openai
def gpt35(prompt, model="text-davinci-002", temperature=0.4, max_tokens=1000,
          top_p=1, stop=["\n\n", "\n\t\n", "\n    \n"]):
    response = openai.Completion.create(
        model=model,
        prompt=prompt,
        temperature=temperature,
        max_tokens=max_tokens,
        top_p=top_p,
        stop=stop
    )
    message = response["choices"][0]["text"]
    return message
code = """
def format_time(days):
    years, days = divmod(days, 365)
    months, days = divmod(days, 30)
    weeks, days = divmod(days, 7)
    time_str = ""
    if years > 0:
        time_str += str(years) + "y"
    if months > 0:
        time_str += str(months) + "m"
    if weeks > 0:
        time_str += str(weeks) + "w"
    if days > 0:
        time_str += str(days) + "d"
    return time_str
"""
def explain_code(function_to_test, unit_test_package="pytest"):
    prompt = f""""# How to write great unit tests with {unit_test_package}
In this advanced tutorial for experts, we'll use Python 3.8 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.
```python
{function_to_test}
Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.
- First,"""
    response = gpt35(prompt)
    return response, prompt
code_explaination, prompt_to_explain_code = explain_code(code)
print(code_explaination)

首要界说了一个gpt35的函数,这个函数的效果如下:

  1. 运用 text-davinci-002 模型,这是一个经过监督学习微调的生成文本的模型,期望生成目标清晰的文本代码解说。
  2. 对 stop 做了特别的设置,只需接连两个换行或者相似接连两个换行的状况呈现,就中止数据的生成,防止模型一口气连测验代码也生成出来。

然后,经过一组精心设计的提示语,让GPT模型为咱们来解说代码。

  • 指定运用pytest的测验包。
  • 把对应的测验代码提供给GPT模型。
  • 让AI答复,要准确描述代码做了什么。
  • 最终用 “-First” 最初,引导GPT模型,逐渐分行描述要测验的代码。

输出成果:

 the function takes an integer value representing days as its sole argument.
- Next, the `divmod` function is used to calculate the number of years and days, the number of months and days, and the number of weeks and days.
- Finally, a string is built up and returned that contains the number of years, months, weeks, and days.

2.3、让AI依据代码解说拟定测验方案

def generate_a_test_plan(full_code_explaination, unit_test_package="pytest"):
    prompt_to_explain_a_plan = f"""
A good unit test suite should aim to:
- Test the function's behavior for a wide range of possible inputs
- Test edge cases that the author may not have foreseen
- Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain
- Be easy to read and understand, with clean code and descriptive names
- Be deterministic, so that the tests always pass or fail in the same way
`{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.
For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):
-"""
    prompt = full_code_explaination+prompt_to_explain_a_plan
    response = gpt35(prompt)
    return response, prompt
test_plan, prompt_to_get_test_plan = generate_a_test_plan(prompt_to_explain_code+code_explaination)
print(test_plan)

针对生成的测验方案,对AI拟定了几点要求:

  • 测验用例要掩盖更广的范围。
  • 测验用例的边界要涉及到作者无法想到的场景。
  • 充分利用pytest的特性。
  • 保证测验用例简洁、易了解。
  • 测验用例的成果是确定的,要么成功、要么失败。

输出成果:

 Normal inputs:
    - `days` is a positive integer
    - `days` is 0
- Edge cases:
    - `days` is a negative integer
    - `days` is a float
    - `days` is a string
- Invalid inputs:
    - `days` is `None`
    - `days` is a list

2.4、依据测验方案生成测验代码

def generate_test_cases(function_to_test, unit_test_package="pytest"):
    starter_comment = "Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator"
    prompt_to_generate_the_unit_test = f"""
Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.
```python
import {unit_test_package}  # used for our unit tests
{function_to_test}
#{starter_comment}"""
    full_unit_test_prompt = prompt_to_explain_code + code_explaination + test_plan + prompt_to_generate_the_unit_test
    return gpt35(model="text-davinci-003", prompt=full_unit_test_prompt, stop="```"), prompt_to_generate_the_unit_test
unit_test_response, prompt_to_generate_the_unit_test = generate_test_cases(code)
print(unit_test_response)

输出成果:

@pytest.mark.parametrize("days, expected", [
    (1, "1d"),  # normal input
    (7, "1w"),  # normal input
    (30, "1m"),  # normal input
    (365, "1y"),  # normal input
    (731, "2y"),  # normal input
    (-1, pytest.raises(ValueError)),  # abnormal input
    (0, pytest.raises(ValueError)),  # abnormal input
    (1.5, pytest.raises(TypeError)),  # abnormal input
    ("1", pytest.raises(TypeError)),  # abnormal input
])
def test_format_time(days, expected):
    """
    Test the format_time() function.
    """
    if isinstance(expected, type):
        # check that the expected result is a type, i.e. an exception
        with pytest.raises(expected):
            # if so, check that the function raises the expected exception
            format_time(days)
    else:
        # otherwise, check that the function returns the expected value
        assert format_time(days) == expected

2.5、经过AST库进行语法查看

最终咱们最好还是要再查看一下生成的测验代码语法,这个能够经过Python的AST库来完结。查看代码的时候,咱们不只需求生成的测验代码,也需求原来的功用代码,不然无法经过语法查看。

import ast
code_start_index = prompt_to_generate_the_unit_test.find("```python\n") + len("```python\n")
code_output = prompt_to_generate_the_unit_test[code_start_index:] + unit_test_response
try:
    ast.parse(code_output)
except SyntaxError as e:
    print(f"Syntax error in generated code: {e}")
print(code_output)

输出成果:

import pytest  # used for our unit tests
def format_time(days):
    years, days = divmod(days, 365)
    months, days = divmod(days, 30)
    weeks, days = divmod(days, 7)
    time_str = ""
    if years > 0:
        time_str += str(years) + "y"
    if months > 0:
        time_str += str(months) + "m"
    if weeks > 0:
        time_str += str(weeks) + "w"
    if days > 0:
        time_str += str(days) + "d"
    return time_str
#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator.
#The first element of the tuple is the name of the test case, and the second element is a list of arguments to pass to the function.
#The @pytest.mark.parametrize decorator allows us to write a single test function that can be used to test multiple input values.
@pytest.mark.parametrize("test_input,expected", [
    ("Valid Inputs", [
        (0, "0d"),  # test for 0 days
        (1, "1d"),  # test for 1 day
        (7, "7d"),  # test for 7 days
        (30, "1m"),  # test for 30 days
        (365, "1y"),  # test for 365 days
        (400, "1y35d"),  # test for 400 days
        (800, "2y160d"),  # test for 800 days
        (3650, "10y"),  # test for 3650 days
        (3651, "10y1d"),  # test for 3651 days
    ]),
    ("Invalid Inputs", [
        ("string", None),  # test for string input
        ([], None),  # test for list input
        ((), None),  # test for tuple input
        ({}, None),  # test for set input
        ({1: 1}, None),  # test for dictionary input
        (1.5, None),  # test for float input
        (None, None),  # test for None input
    ]),
    ("Edge Cases", [
        (10000000000, "274247y5m2w6d"),  # test for large positive integer
        (1, "1d"),  # test for small positive integer
        (-10000000000, "-274247y5m2w6d"),  # test for large negative integer
        (-1, "-1d")  # test for small negative integer
    ])
])
def test_format_time(test_input, expected):
    # This test function uses the @pytest.mark.parametrize decorator to loop through each test case.
    # The test_input parameter contains the name of the test case, and the expected parameter contains a list of arguments to pass to the function.
    # The test_input parameter is not used in the test, but is included for readability.
    for days, expected_result in expected:
        # For each argument in the expected parameter, we call the format_time() function and compare the result to the expected result.
        assert format_time(days) == expected_result

从上面看到有些测验用例跟预期还是有距离的,比方:

@pytest.mark.parametrize("test_input,expected", [
    ("Valid Inputs", [
        (7, "7d" -> "1w"),  # test for 7 days
        (30, "1m"),  # test for 30 days
        (365, "1y"),  # test for 365 days
        (400, "1y35d" -> "1y1m5d"),  # test for 400 days
        (800, "2y160d" -> "2y5m1w3d"),  # test for 800 days
        (3650, "10y"),  # test for 3650 days
        (3651, "10y1d"),  # test for 3651 days
    ]),

三、用LangChain进一步封装

OpenAI 的大言语模型,只是提供了简简单单的 Completion 和 Embedding 这样两个中心接口,经过合理运用这两个接口,咱们完结了各式各样杂乱的任务。

  • 经过提示语(Prompt)里包括历史的聊天记录,咱们能够让 AI 依据上下文正确地答复问题。
  • 经过将 Embedding 提早索引好存起来,咱们能够让 AI 依据外部常识答复问题。
  • 而经过多轮对话,将 AI 回来的答案放在新的问题里,咱们能够让 AI 帮咱们给自己的代码编撰单元测验。

llama-index 专注于为大言语模型的应用构建索引,虽然 Langchain 也有相似的功用,但这一点并不是 Langchain 的首要卖点。Langchain 的第一个卖点其实就在它的名字里,也便是链式调用。

3.1、经过 Langchain 完结自动化编撰单元测验

上面经过多步提示语自动给代码写单元测验。Langchain能够次序地经过多个Prompt调用OpenAI的GPT模型,这个能力用来完结自动化测验的功用正好匹配。

from langchain import PromptTemplate, OpenAI, LLMChain
from langchain.chains import SequentialChain
import ast
def write_unit_test(function_to_test, unit_test_package="pytest"):
    # 解说源代码的过程
    explain_code = """"# How to write great unit tests with {unit_test_package}
    In this advanced tutorial for experts, we'll use Python 3.8 and `{unit_test_package}` to write a suite of unit tests to verify the behavior of the following function.
    ```python
    {function_to_test}
    ```
    Before writing any unit tests, let's review what each element of the function is doing exactly and what the author's intentions may have been.
    - First,"""
    explain_code_template = PromptTemplate(
        input_variables=["unit_test_package", "function_to_test"],
        template=explain_code
    )
    explain_code_llm = OpenAI(model_name="text-davinci-002", temperature=0.4, max_tokens=1000,
                              top_p=1, stop=["\n\n", "\n\t\n", "\n    \n"])
    explain_code_step = LLMChain(llm=explain_code_llm, prompt=explain_code_template, output_key="code_explaination")
    # 创建测验方案示例的过程
    test_plan = """
    A good unit test suite should aim to:
    - Test the function's behavior for a wide range of possible inputs
    - Test edge cases that the author may not have foreseen
    - Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintain
    - Be easy to read and understand, with clean code and descriptive names
    - Be deterministic, so that the tests always pass or fail in the same way
    `{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests. We'll use them to write unit tests for the function above.
    For this particular function, we'll want our unit tests to handle the following diverse scenarios (and under each scenario, we include a few examples as sub-bullets):
    -"""
    test_plan_template = PromptTemplate(
        input_variables=["unit_test_package", "function_to_test", "code_explaination"],
        template=explain_code+"{code_explaination}"+test_plan
    )
    test_plan_llm = OpenAI(model_name="text-davinci-002", temperature=0.4, max_tokens=1000,
                           top_p=1, stop=["\n\n", "\n\t\n", "\n    \n"])
    test_plan_step = LLMChain(llm=test_plan_llm, prompt=test_plan_template, output_key="test_plan")
    # 编撰测验代码的过程
    starter_comment = "Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator"
    prompt_to_generate_the_unit_test = """
Before going into the individual tests, let's first look at the complete suite of unit tests as a cohesive whole. We've added helpful comments to explain what each line does.
```python
import {unit_test_package}  # used for our unit tests
{function_to_test}
#{starter_comment}"""
    unit_test_template = PromptTemplate(
        input_variables=["unit_test_package", "function_to_test", "code_explaination", "test_plan", "starter_comment"],
        template=explain_code+"{code_explaination}"+test_plan+"{test_plan}"+prompt_to_generate_the_unit_test
    )
    unit_test_llm = OpenAI(model_name="text-davinci-002", temperature=0.4, max_tokens=1000, stop="```")
    unit_test_step = LLMChain(llm=unit_test_llm, prompt=unit_test_template, output_key="unit_test")
    sequential_chain = SequentialChain(chains=[explain_code_step, test_plan_step, unit_test_step],
                                       input_variables=["unit_test_package", "function_to_test", "starter_comment"],
                                       verbose=True)
    answer = sequential_chain.run(unit_test_package=unit_test_package, function_to_test=function_to_test,
                                  starter_comment=starter_comment)
    return f"""#{starter_comment}"""+answer
code = """
def format_time(days):
    years, days = divmod(days, 365)
    months, days = divmod(days, 30)
    weeks, days = divmod(days, 7)
    time_str = ""
    if years > 0:
        time_str += str(years) + "y"
    if months > 0:
        time_str += str(months) + "m"
    if weeks > 0:
        time_str += str(weeks) + "w"
    if days > 0:
        time_str += str(days) + "d"
    return time_str
"""
def write_unit_test_automatically(code, retry=3):
    unit_test_code = write_unit_test(code)
    all_code = code+unit_test_code
    tried = 0
    while tried < retry:
        try:
            ast.parse(all_code)
            return all_code
        except SyntaxError as e:
            print(f"Syntax error in generated code: {e}")
            all_code = code+write_unit_test(code)
            tried += 1
print(write_unit_test_automatically(code))

输出:

def format_time(days):
    years, days = divmod(days, 365)
    months, days = divmod(days, 30)
    weeks, days = divmod(days, 7)
    time_str = ""
    if years > 0:
        time_str += str(years) + "y"
    if months > 0:
        time_str += str(months) + "m"
    if weeks > 0:
        time_str += str(weeks) + "w"
    if days > 0:
        time_str += str(days) + "d"
    return time_str
#Below, each test case is represented by a tuple passed to the @pytest.mark.parametrize decorator.
#The first element of the tuple is the name of the test case, and the second element is a list of tuples.
#Each tuple in the list of tuples represents an individual test.
#The first element of each tuple is the input to the function (days), and the second element is the expected output of the function.
@pytest.mark.parametrize('test_case_name, test_cases', [
    # Test cases for when the days argument is a positive integer
    ('positive_int', [
        (1, '1d'),
        (10, '10d'),
        (100, '1y3m2w1d')
    ]),
    # Test cases for when the days argument is 0
    ('zero', [
        (0, '')
    ]),
    # Test cases for when the days argument is negative
    ('negative_int', [
        (-1, '-1d'),
        (-10, '-10d'),
        (-100, '-1y-3m-2w-1d')
    ]),
    # Test cases for when the days argument is not an integer
    ('non_int', [
        (1.5, pytest.raises(TypeError)),
        ('1', pytest.raises(TypeError))
    ])
])
def test_format_time(days, expected_output):
    # This test function is called once for each test case.
    # days is set to the input for the function, and expected_output is set to the expected output of the function.
    # We can use the pytest.raises context manager to test for exceptions.
    if isinstance(expected_output, type) and issubclass(expected_output, Exception):
        with pytest.raises(expected_output):
            format_time(days)
    else:
        assert format_time(days) == expected_output

四、总结

想要经过大言语模型,完结一个杂乱的任务,往往需求咱们屡次向 AI 发问,而且前面发问的答案,可能是后面问题输入的一部分。LangChain 经过将多个 LLMChain 组合成一个 SequantialChain 并次序履行,大大简化了这类任务的开发作业。

AI写测试用例

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。