Re-Rankingの力:RAGシステムを強化する

はじめに

RAG(Retriever-Augmented Generation)システムにおけるRe-Ranking技術について掘り下げ、その技術が情報の関連性と正確性をいかに向上させるかを解説します。実装戦略の詳細な手順も紹介します。

Re-RankingでRAGのパフォーマンスを向上させる

データセットのサイズと複雑さが増すにつれて、複雑なクエリに対して適切な回答を返すために関連情報を選別することが重要になります。この目的のために、Re-Rankingと呼ばれる技術群があります。これにより、テキスト内の重要なチャンクを理解し、文書を並べ替え、最も関連性の高いものを優先順位付けして返すことができます。

Re-Rankingには主に2つのアプローチがあります:

  1. Re-Rankingモデルを埋め込みモデルの代替技術として使用する。クエリとコンテキストを入力として受け取り、埋め込みの代わりに類似度スコアを返します。
  2. LLM(大規模言語モデル)を使用して文書内の意味情報を効率的にキャプチャする

これらのRe-Rankingアプローチを適用する前に、基準となるRAGシステムが第2のクエリに対して返すトップ3のチャンクを評価してみましょう:

retriever = index.as_retriever(similarity_top_k=3)
query = "Compare the families of Emma Stone and Ryan Gosling"
nodes = retriever.retrieve(query)
for node in nodes:
    print('----------------------------------------------------')
    display_source_node(node, source_length = 500)
      

これはRe-Ranking前の出力です。各チャンクにはノードIDと類似度スコアがあります。

Node ID: 9b3817fe-3a3f-4417-83d2-2e2996c8b467
Similarity: 0.8415899563985404
Text: Emily Jean "Emma" Stone (born November 6, 1988) is an American actress and producer. She is the recipient of various accolades, including two Academy Awards, two British Academy Film Awards, and two Golden Globe Awards. In 2017, she was the world's highest-paid actress and named by Time magazine as one of the 100 most influential people in the world. Born and raised in Scottsdale, Arizona, Stone began acting as a child in a theater production of The Wind in the Willows in 2000. As a teenager,...

Node ID: 2bef0308-8b0f-4f7e-9cd6-92ce5acf811g
Similarity: 0.831147173341674
Text: Coincidentally, Gosling turned down the Beast role in Beauty and the Beast in favor of La La Land. Chazelle subsequently decided to make his characters somewhat older, with experience in struggling to make their dreams, rather than younger newcomers just arriving in Los Angeles. Emma Stone plays Mia, an aspiring actress in Los Angeles. Stone has loved musicals since she saw Les Misérables when she was eight years old. She said "bursting into song has always been a real dream of mine", and her ...

Node ID: 876ae445-b12e-4d20-99b7-5e5a91ee7d77
Similarity: 0.8289486590392277
Text: Stone was named the best-dressed woman of 2012 by Vogue and was included on similar listings by Glamour in 2013 and 2015, and People in 2014.== Personal life == Stone moved from Los Angeles to Greenwich Village, New York, in 2009. In 2016, she moved back to Los Angeles. Despite significant media attention, she refuses to publicly discuss her personal life. Concerned with living a normal life, Stone has said she dislikes receiving paparazzi attention outside her home. She has expressed her ...

FlagEmbeddingRerankerを使用したRe-Ranking

次に、Hugging Faceから提供されるオープンソースのRe-Rankingモデル、bge-ranker-baseモデルを使用して関連チャンクを取得します。

HF_TOKEN = userdata.get('HF_TOKEN')
os.environ['HF_TOKEN'] = HF_TOKEN

!pip install llama-index-postprocessor-flag-embedding-reranker
!pip install git+https://github.com/FlagOpen/FlagEmbedding.git

python

Copy code

from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker
from llama_index.core.schema import QueryBundle

reranker = FlagEmbeddingReranker(
    top_n = 3,
    model = "BAAI/bge-reranker-base",
)
query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle=query_bundle)
for ranked_node in ranked_nodes:
    print('----------------------------------------------------')
    display_source_node(ranked_node, source_length=500)

これがRe-Ranking後の結果です:

Node ID: 9b3817fe-3a3f-4417-83d2-2e2996c8b467
Similarity: 3.0143558979034424
Text: Emily Jean "Emma" Stone (born November 6, 1988) is an American actress and producer. She is the recipient of various accolades, including two Academy Awards, two British Academy Film Awards, and two Golden Globe Awards. In 2017, she was the world's highest-paid actress and named by Time magazine as one of the 100 most influential people in the world. Born and raised in Scottsdale, Arizona, Stone began acting as a child in a theater production of The Wind in the Willows in 2000. As a teenager,...

Node ID: 876ae445-b12e-4d20-99b7-5e5a91ee7d77
Similarity: 2.2117154598236084
Text: Stone was named the best-dressed woman of 2012 by Vogue and was included on similar listings by Glamour in 2013 and 2015, and People in 2014.== Personal life == Stone moved from Los Angeles to Greenwich Village, New York, in 2009. In 2016, she moved back to Los Angeles. Despite significant media attention, she refuses to publicly discuss her personal life. Concerned with living a normal life, Stone has said she dislikes receiving paparazzi attention outside her home. She has expressed her ...

Node ID: 2bef0308-8b0f-4f7e-9cd6-92ce5acf811g
Similarity: 1.6185210943222046
Text: Coincidentally, Gosling turned down the Beast role in Beauty and the Beast in favor of La La Land. Chazelle subsequently decided to make his characters somewhat older, with experience in struggling to make their dreams, rather than younger newcomers just arriving in Los Angeles.Emma Stone plays Mia, an aspiring actress in Los Angeles. Stone has loved musicals since she saw Les Misérables when she was eight years old. She said "bursting into song has always been a real dream of mine", and her ...

Re-Rankingを使用すると、関連度スコアのばらつきが増加し、特定のノードが順位を変動することがわかります。

RankGPTRerankを使用したRe-Ranking

次に、GPTモデルの機能を利用して文書をランク付けするRankGPTモジュールを使用します。

!pip install llama-index-postprocessor-rankgpt-rerank

from llama_index.postprocessor.rankgpt_rerank import RankGPTRerank

reranker = RankGPTRerank(
    top_n = 3,
    llm = OpenAI(model="gpt-3.5-turbo-0125"),
)
query_bundle = QueryBundle(query_str=query)
ranked_nodes = reranker._postprocess_nodes(nodes, query_bundle=query_bundle)
for ranked_node in ranked_nodes:
    print('----------------------------------------------------')
    display_source_node(ranked_node, source_length=500)

RankGPTを使用することで、次のチャンクが得られます:

Node ID: 2bef0308-8b0f-4f7e-9cd6-92ce5acf811g
Similarity: 1.6185210943222046
Text: Coincidentally, Gosling turned down the Beast role in Beauty and the Beast in favor of La La Land. Chazelle subsequently decided to make his characters somewhat older, with experience in struggling to make their dreams, rather than younger newcomers just arriving in Los Angeles.Emma Stone plays Mia, an aspiring actress in Los Angeles. Stone has loved musicals since she saw Les Misérables when she was eight years old. She said "bursting into song has always been a real dream of mine", and her ...

Node ID: 9b3817fe-3a3f-4417-83d2-2e2996c8b467
Similarity: 3.0143558979034424
Text: Emily Jean "Emma" Stone (born November 6, 1988) is an American actress and producer. She is the recipient of various accolades, including two Academy Awards, two British Academy Film Awards, and two Golden Globe Awards. In 2017, she was the world's highest-paid actress and named by Time magazine as one of the 100 most influential people in the world. Born and raised in Scottsdale, Arizona, Stone began acting as a child in a theater production of The Wind in the Willows in 2000. As a teenager,...

Node ID: 876ae445-b12e-4d20-99b7-5e5a91ee7d77
Similarity: 2.2117154598236084
Text: Stone was named the best-dressed woman of 2012 by Vogue and was included on similar listings by Glamour in 2013 and 2015, and People in 2014.== Personal life == Stone moved from Los Angeles to Greenwich Village, New York, in 2009. In 2016, she moved back to Los Angeles. Despite significant media attention, she refuses to publicly discuss her personal life. Concerned with living a normal life, Stone has said she dislikes receiving paparazzi attention outside her home. She has expressed her ...

RankGPTでは、類似度スコアに基づいて並べ替えを行わないため、出力に示されたスコアは更新されません。しかし、最も関連性の高いノードがEmma StoneとRyan Goslingの両方に言及していることがわかります。

結論

Re-Ranking技術を導入することで、RAGシステムの回答の質を大幅に向上させることができます。FlagEmbeddingRerankerやRankGPTRerankのようなツールを活用することで、情報の関連性と正確性を高め、より良いユーザー体験を提供することが可能です。次のステップとして、LLMベースのRe-Rankingアプローチを試して、さらにパフォーマンスを向上させる方法を検討してみましょう。