llama-cpp-agentでローカルLLM+RAGを試す

前回、llama-cpp-agentというLLMのフレームワークについて話しました。

今回はその続きです。今回はRAG(Retrieval Augmented Generation)を試していこうと思います。

補足: llama-cpp-agentはこちらです。

GitHub - Maximilian-Winter/llama-cpp-agent: The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users ...

実行環境

基本的に、前回の記事をご参照ください。

また、追加でモジュールをインストールする必要があります。pipで入ります。

RAGatouille : RAGのフレームワーク。検索や追加などの機能をサポート
chromadb : ベクトル埋め込みに対応したDB
pysqlite3-binary : chromadbにsqltie3のバージョンが低いと怒られるため、必要。

更新したDockerfileを載せておきます。

FROM python:3.10-slim-bullseye

ARG USERNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm

RUN apt-get update \
    && groupadd --gid $USER_GID $USERNAME \
    && useradd -s /bin/bash --uid $USER_UID --gid $USER_GID -m $USERNAME \
    && apt-get install -y sudo \
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME \
    && apt-get -y install locales \
    && localedef -f UTF-8 -i ja_JP ja_JP.UTF-8

RUN apt install -y build-essential libssl-dev
RUN apt install -y gcc g++

RUN apt -y install cmake
RUN apt-get -y install git

RUN pip install --upgrade pip
RUN pip install --upgrade setuptools

RUN pip install llama-cpp-agent
RUN pip install ragatouille \
    chromadb \
    pip install pysqlite3-binary

リビルドしてコンテナに入ります。

動かしてみる

公式のサンプルコードは動かないので、更新に追いついていない感じがしました(2024/06/14現在)。

RAG- Retrieval Augmented Generation - llama-cpp-agent

ですので、適時修正しながら進めていきます。.ipynbのnotebook形式で進めました。

最初にchromadbをインポートすると、sqliteのバージョンが足りないと言われます。ですので、インポートの前にpysqlite3をデフォルトにします。

ベースとなるpythonによっては必要ないかもしれません。

__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

次に、使うライブラリをインポートします。

※RecursiveCharacterTextSplitterが公式のサンプルコードとインポート場所が違ったので注意してください

from ragatouille.utils import get_wikipedia_page
from llama_cpp_agent.messages_formatter import MessagesFormatterType

from typing import List
from pydantic import BaseModel, Field

from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import generate_gbnf_grammar_and_documentation
from llama_cpp_agent.rag.rag_colbert_reranker import RAGColbertReranker
from llama_cpp_agent.text_utils import RecursiveCharacterTextSplitter
from llama_cpp_agent.llm_output_settings import LlmStructuredOutputSettings, LlmStructuredOutputType

from llama_cpp import Llama
from llama_cpp_agent.providers import LlamaCppPythonProvider

さて、ここからRAGを使ったローカルLLMを始めていきます。まず、RAGオブジェクトを作ります。

また、参考にするドキュメントを一定チャンクで切り分けるsplitterを用意します。

rag = RAGColbertReranker(persistent=False)
length_function = len
splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=512,
    chunk_overlap=0,
    length_function=length_function,
    keep_separator=True
)

RAGに格納する情報を用意し、splitterで切り分けてからRAGのデータベースへと格納します。

サンプルコードではwikipediaのSynthetic_diamondのページを用意していました。

page = get_wikipedia_page("Synthetic_diamond")
splits = splitter.split_text(page)
for split in splits:
    rag.add_document(split)

LLMを用意します。コンテキスト長と出力最大トークンは4096にしました。大量の資料を与える場合は、phi-3-mini-128kとかのロングコンテキストに対応したモデルを選んだほうがいいと思います。

llama_model = Llama('/work/models/Phi-3-mini-4k-instruct-q4.gguf', n_ctx=4096)
provider = LlamaCppPythonProvider(llama_model)
settings = provider.get_provider_default_settings()
settings.max_tokens = 4096

追加クエリのリストを出力するための、構造を用意します。

class QueryExtension(BaseModel):
    """
    Represents an extension of a query as additional queries.
    """
    queries: List[str] = Field(default_factory=list, description="List of queries.")

grammar, docs = generate_gbnf_grammar_and_documentation([QueryExtension])

エージェントを作ります。system_promptとして、上記のクエリリストの構造を渡して、追加のクエリを考えるように依頼します。

それと、前回使っていなかったけど、MessageFormatterTypeにphi-3があったので、それも指定します。

query_extension_agent = LlamaCppAgent(
    provider,
    debug_output=True,
    system_prompt="You are a world class query extension algorithm capable of extending queries by writing new queries. Do not answer the queries, simply provide a list of additional queries in JSON format. Structure your output according to the following model:\n\n" + docs.strip(),
    predefined_messages_formatter_type=MessagesFormatterType.PHI_3
)

メインとなるクエリを用意します。これが今回のLLMに与える質問文に相当します。

query = "What is a BARS apparatus?"

メインクエリから追加のクエリをLLMによって、生成します。

このとき、生成物が上で指定したQueryExtensionの構造になるようにstructured_output_settingsを設定します。これは追加のクエリのリストを持つオブジェクトです。

output_settings = LlmStructuredOutputSettings.from_pydantic_models([QueryExtension], output_type=LlmStructuredOutputType.object_instance)
output = query_extension_agent.get_chat_response(
    f"Consider the following query: {query}", structured_output_settings=output_settings)

実際に、追加のクエリを見てみます。

output.queries

['What does BARS stand for in an analytical context?',
 'List applications of BARS in chemistry.',
 'Describe the components and functionality of a typical BARS apparatus.',
 'How is data collected using a BARS apparatus?',
 'Compare BARS with other analytical techniques used for elemental analysis.']

翻訳すると、

「分析の文脈で BARS とは何を意味しますか?」、「化学における BARS の用途を列挙してください。」、
「一般的な BARS 装置のコンポーネントと機能について説明してください。」、
「BARS 装置を使用してデータはどのように収集されますか?」、
「BARS を元素分析に使用される他の分析手法と比較してください。」

いい感じに、追加の質問文を生成できていることがわかります。

では、メインクエリと追加クエリに対して、RAGのデータベースから関連する資料を取ってきます。

そして、その資料のテキストをpromptに追加します。

prompt = "Consider the following context:\n==========Context===========\n"
documents = rag.retrieve_documents(query, k=3)
for doc in documents:
    prompt += doc["content"] + "\n\n"

for qu in output.queries:
    documents = rag.retrieve_documents(qu, k=3)
    for doc in documents:
        if doc["content"] not in prompt:
            prompt += doc["content"] + "\n\n"
prompt += "\n======================\nQuestion: " + query

プロンプトを確認すると、資料の情報を載せた質問文が生成できていることが確認できます。

print(prompt)

Consider the following context:
==========Context===========
on the anvils to achieve the same pressure. An alternative is to decrease the surface area to volume ratio of the pressurized volume, by using more anvils to converge upon a higher-order platonic solid, such as a dodecahedron. However, such a press would be complex and difficult to manufacture.

The BARS apparatus is claimed to be the most compact, efficient, and economical of all the diamond-producing presses. In the center of a BARS device, there is a ceramic cylindrical "synthesis capsule" of about 2 

cm3 (0.12 cu in) in size. The cell is placed into a cube of pressure-transmitting material, such as pyrophyllite ceramics, which is pressed by inner anvils made from cemented carbide (e.g., tungsten carbide or VK10 hard alloy). The outer octahedral cavity is pressed by 8 steel outer anvils. After mounting, the whole assembly is locked in a disc-type barrel with a diameter about 1 m (3 ft 3 in). The barrel is filled with oil, which pressurizes upon heating, and the oil pressure is transferred to the central cell. 

These probes consist of a pair of battery-powered thermistors mounted in a fine copper tip. One thermistor functions as a heating device while the other measures the temperature of the copper tip: if the stone being tested is a diamond, it will conduct the tip's thermal energy rapidly enough to produce a measurable temperature drop. This test takes about 2–3 seconds.


== Applications ==


=== Machining and cutting tools ===

Most industrial applications of synthetic diamond have long been associated with 

including lab-grown diamonds within the scope of the definition of "diamond". The revised guide further states that "If a marketer uses 'synthetic' to imply that a competitor's lab-grown diamond is not an actual diamond, ... this would be deceptive." In July 2019, the third party diamond certification lab GIA (Gemological Institute of America) dropped the word 'synthetic' from its certification process and report for lab-grown diamonds, according to the FTC revision.




== Bibliography ==
Barnard, A. S. (2000). The diamond formula: diamond synthesis-a gemological perspective. Butterworth-Heinemann. ISBN 978-0-7506-4244-6.
O'Donoghue, Michael (2006). Gems: their sources, descriptions and identification. Butterworth-Heinemann. ISBN 978-0-7506-5856-0.
Spear, K. E. & Dismukes, J. P. (1994). Synthetic diamond. Wiley-IEEE. ISBN 978-0-471-53589-8.
Lundblad, Erik (1988). Om konsten att göra diamanter. In Daedalus 1988. ISBN 9176160181


== External links ==



In the HPHT method, there are three main press designs used to supply the pressure and temperature necessary to produce synthetic diamond: the belt press, the cubic press and the split-sphere (BARS) press. Diamond seeds are placed at the bottom of the press.


Electronic applications of synthetic diamond are being developed, including high-power switches at power stations, high-frequency field-effect transistors and light-emitting diodes. Synthetic diamond detectors of ultraviolet (UV) light or high-energy particles are used at high-energy research facilities and are available commercially. Due to its unique combination of thermal and chemical stability, low thermal expansion and high optical transparency in a wide spectral range, synthetic diamond is becoming 

For instance, pure diamond is an electrical insulator, but diamond with boron added is an electrical conductor (and, in some cases, a superconductor), allowing it to be used in electronic applications. Nitrogen impurities hinder movement of lattice dislocations (defects within the crystal structure) and put the lattice under compressive stress, thereby increasing hardness and toughness.


=== Thermal conductivity ===


diamonds were chemically identical, their physical properties were not the same. The colorless stones produced strong fluorescence and phosphorescence under short-wavelength ultraviolet light, but were inert under long-wave UV. Among natural diamonds, only the rarer blue gems exhibit these properties. Unlike natural diamonds, all the GE stones showed strong yellow fluorescence under X-rays. The De Beers Diamond Research Laboratory has grown stones of up to 25 carats (5.0 g) for research purposes. Stable 

mining and cutting tools. Polycrystalline diamond is often described by the average size (or grain size) of the crystals that make it up. Grain sizes range from nanometers to hundreds of micrometers, usually referred to as "nanocrystalline" and "microcrystalline" diamond, respectively.


=== Hardness ===
The hardness of diamond is 10 on the Mohs scale of mineral hardness, the hardest known material on this scale. Diamond is also the hardest known natural material for its resistance to indentation. The 


======================
Question: What is a BARS apparatus?

では、このプロンプトをLLMに与えてみます。

agent_with_rag_information = LlamaCppAgent(
    provider,
    debug_output=True,
    system_prompt="You are an advanced AI assistant, trained by OpenAI. Only answer question based on the context information provided.",
    predefined_messages_formatter_type=MessagesFormatterType.PHI_3
)
agent_output = agent_with_rag_information.get_chat_response(prompt, llm_sampling_settings=settings)
print(agent_output)

結果は次のとおりでした。

The BARS apparatus refers to an innovative design used in synthetic diamond production, specifically within the High Pressure-High Temperature (HPHT) method. It consists of a central ceramic cylindrical "synthesis capsule" about 2 cm³ (0.12 cubic inches) in size, surrounded by pressure-transmitting material like pyrophyllite ceramics. This setup is encapsulated within an octahedral outer cavity of steel anvils and secured in a disc-shaped barrel with a diameter of approximately 1 meter (3 feet 3 inches). The oil inside the barrel applies pressure when heated, thereby facilitating diamond formation under high temperature and pressure conditions

以下翻訳文です。

BARS 装置は、合成ダイヤモンド製造、特に高圧高温 (HPHT) 法で使用される革新的な設計を指します。この装置は、約 2 cm³ (0.12 立方インチ) の大きさの中央の円筒形のセラミック「合成カプセル」で構成され、パイロフィライト セラミックなどの圧力伝達材料で囲まれています。この装置は、鋼鉄製のアンビルの八面体の外側の空洞内にカプセル化され、直径約 1 メートル (3 フィート 3 インチ) の円盤状のバレルに固定されています。バレル内のオイルは加熱されると圧力をかけるため、高温高圧条件下でダイヤモンドの形成が促進されます。

いい感じに資料を参照して回答できていると思います。

ちなみに、RAGを使わないで、素のLLMにBRASを聞くと次のような返答になります。

prompt = query
agent_with_rag_information = LlamaCppAgent(
    provider,
    debug_output=True,
    system_prompt="You are an advanced AI assistant",
    predefined_messages_formatter_type=MessagesFormatterType.PHI_3
)
agent_output = agent_with_rag_information.get_chat_response(prompt, llm_sampling_settings=settings)
print(agent_output)

A BARS (Behavioral Anchored Rating Scale) apparatus, or system, refers to a performance evaluation method used in human resources and management. This approach focuses on rating employees' job-related behaviors rather than traditional subjective criteria such as traits or general work habits.

The BARS system involves the following steps:

1. Identifying key critical incidents or examples of effective or poor performance related to specific job responsibilities and competencies. These incidents often involve situations where an employee's actions have a significant impact on their overall job effectiveness.

2. Creating behavioral anchors for each rating level, which serve as concrete, observable descriptions that correspond to the various levels of performance in the BARS system. The behavioral anchors are based upon the previously identified critical incidents and may be developed using input from subject matter experts (SMEs), supervisors, or employees themselves.

3. Developing a rating scale by arranging these behavioral anchors along a continuum of performance levels in ascending order. This helps to provide clear expectations for the employee being evaluated and facilitates more accurate assessment of their job-related behaviors.

The BARS system is an effective method for improving both consistency and accuracy in evaluating employees' work performance, as it focuses on specific examples of behavior that directly relate to job requirements, rather than relying solely on subjective judgments or personal opinions about the employee. It also provides a strong foundation for feedback and development conversations between managers and their subordinates.
A BARS (Behavioral Anchored Rating Scale) apparatus, or system, refers to a performance evaluation method used in human resources and management. This approach focuses on rating employees' job-related behaviors rather than traditional subjective criteria such as traits or general work habits.

The BARS system involves the following steps:

以下翻訳文です。

BARS (Behavioral Anchored Rating Scale) 装置またはシステムは、人事および管理で使用されるパフォーマンス評価方法を指します。このアプローチは、特性や一般的な仕事の習慣などの従来の主観的基準ではなく、従業員の職務関連の行動を評価することに重点を置いています。

BARS システムには、次の手順が含まれます。

1. 特定の職務責任および能力に関連する重要な重大なインシデントまたは効果的または不十分なパフォーマンスの例を特定します。これらのインシデントには、従業員の行動が全体的な職務の有効性に大きな影響を与える状況が含まれることがよくあります。

2. 各評価レベルの行動アンカーを作成します。これは、BARS システムでのさまざまなパフォーマンス レベルに対応する具体的で観察可能な説明として機能します。行動アンカーは、以前に特定された重大なインシデントに基づいており、主題専門家 (SME)、監督者、または従業員自身からの入力を使用して開発できます。

3. これらの行動アンカーをパフォーマンス レベルの連続体に沿って昇順に並べることで、評価スケールを開発します。これにより、評価対象の従業員に明確な期待を与えることができ、職務に関連する行動をより正確に評価しやすくなります。

BARS システムは、従業員に関する主観的な判断や個人的な意見だけに頼るのではなく、職務要件に直接関連する行動の具体的な例に焦点を当てているため、従業員の職務パフォーマンスを評価する際の一貫性と正確性の両方を向上させる効果的な方法です。また、管理者と部下の間でフィードバックや開発に関する会話を行うための強力な基盤も提供します。

BARS (Behavioral Anchored Rating Scale) 装置またはシステムは、人事および管理で使用されるパフォーマンス評価方法を指します。このアプローチは、特性や一般的な仕事の習慣などの従来の主観的な基準ではなく、従業員の職務に関連する行動の評価に重点を置いています。

BARS システムには、次の手順が含まれます。

1. 特定の職務責任および能力に関連する重要なインシデントまたは効果的または不十分なパフォーマンスの例を特定する。これらのインシデントには、従業員の行動が全体的な職務の有効性に大きな影響を与える状況が含まれることがよくあります。

2. 各評価レベルの行動アンカーを作成します。これは、BARS システムにおけるさまざまなパフォーマンス レベルに対応する、具体的で観察可能な説明として機能します。行動アンカーは、以前に特定された重大なインシデントに基づいており、主題専門家 (SME)、監督者、または従業員自身からの入力を使用して開発できます。

3. これらの行動アンカーをパフォーマンス レベルの連続体に沿って昇順で配置して、評価スケールを作成します。これにより、評価対象の従業員に対する明確な期待が示され、職務に関連する行動をより正確に評価できます。

BARS システムは、従業員の主観的な判断や個人的な意見だけに頼るのではなく、職務要件に直接関連する行動の具体的な例に焦点を当てているため、従業員の職務パフォーマンスを評価する際の一貫性と正確性の両方を向上させる効果的な方法です。また、マネージャーと部下の間でのフィードバックと開発の会話の強力な基盤も提供します。

別のBARSの話をし始めました。やはりRAGによって情報を付加できたことが良い結果になったのでしょう。