Skip to content

kyopark2014/LLM-LangChain

Repository files navigation

LangChain ํ™œ์šฉํ•˜๊ธฐ

LangChain์€ LM(Large Language)์„ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” Framework์ž…๋‹ˆ๋‹ค.

Basic

LangChain Basic์—์„œ๋Š” LangChain์˜ ๊ฐ ๊ตฌ์„ฑ๋ณ„ Sample ์ฝ”๋“œ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Falcon FM์—์„œ LangChain ์‚ฌ์šฉํ•˜๊ธฐ

Falcon FM์œผ๋กœ ๋งŒ๋“  SageMaker Endpoint์— LangChain์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. SageMaker JumpStart๋กœ Falcon FM ์„ค์น˜ํ•˜๊ธฐ์—์„œ ์–ป์€ SageMaker Endpoint(์˜ˆ: jumpstart-dft-hf-llm-falcon-7b-instruct-bf16)๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

SageMaker Endpoint๋ฅผ ์œ„ํ•œ LangChain ์„ ์–ธ

Falcon์˜ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์„ ์ฐธ์กฐํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ContentHandler์˜ transform_input, transform_output์„ ๋“ฑ๋กํ•ฉ๋‹ˆ๋‹ค.

from langchain import PromptTemplate, SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps({'inputs': prompt, 'parameters': model_kwargs})
        return input_str.encode('utf-8')
      
    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))        
        return response_json[0]["generated_text"]

์•„๋ž˜์™€ ๊ฐ™์ด endpoint_name, aws_region, parameters, content_handler์„ ์ด์šฉํ•˜์—ฌ Sagemaker Endpoint์— ๋Œ€ํ•œ llm์„ ๋“ฑ๋กํ•ฉ๋‹ˆ๋‹ค.

endpoint_name = 'jumpstart-dft-hf-llm-falcon-7b-instruct-bf16'
aws_region = boto3.Session().region_name
parameters = {
    "max_new_tokens": 300
}
content_handler = ContentHandler()

llm = SagemakerEndpoint(
    endpoint_name = endpoint_name, 
    region_name = aws_region, 
    model_kwargs = parameters,
    content_handler = content_handler
)

llm์˜ ๋™์ž‘์€ ์•„๋ž˜์™€ ๊ฐ™์ด ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

llm("Tell me a joke")

์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

I once told a joke to a friend, but it didn't work. He just looked

Web Loader

Web loader - langchain์„ ์ด์šฉํ•˜์—ฌ web page๋ฅผ loading ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
index = VectorstoreIndexCreator().from_loaders([loader])

Prompt Template

์•„๋ž˜์™€ ๊ฐ™์ด template๋ฅผ ์ •์˜ํ›„์— LLMChain์„ ์ •์˜ํ›„ run์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„ธ๋ถ€ ๋‚ด์šฉ์€ langchain-sagemaker-endpoint-Q&A.ipynb์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

from langchain import PromptTemplate, LLMChain

template = "Tell me a {adjective} joke about {content}."
prompt = PromptTemplate.from_template(template)

llm_chain = LLMChain(prompt=prompt, llm=llm)

outputText = llm_chain.run(adjective="funny", content="chickens")
print(outputText)

์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Why did the chicken cross the playground? To get to the other slide!

Question / Answering

langchain.chains.question_answering์„ ์ด์šฉํ•˜์—ฌ Document์— ๋Œ€ํ•œ Question/Answering์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์„ธ๋ถ€ ๋‚ด์šฉ์€ langchain-sagemaker-endpoint-Q&A.ipynb์„ ์ฐธ์กฐํ•ฉ๋‹ˆ๋‹ค.

prompt์˜ template์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

template = """Use the following pieces of context to answer the question at the end.

{context}

Question: {question}
Answer:"""

prompt = PromptTemplate(
    template=template, input_variables=["context", "question"]
)

langchain.docstore.document์„ ์ด์šฉํ•˜์—ฌ Document๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

from langchain.docstore.document import Document
example_doc_1 = """
Peter and Elizabeth took a taxi to attend the night party in the city. While in the party, Elizabeth collapsed and was rushed to the hospital.
Since she was diagnosed with a brain injury, the doctor told Peter to stay besides her until she gets well.
Therefore, Peter stayed with her at the hospital for 3 days without leaving.
"""

docs = [
    Document(
        page_content=example_doc_1,
    )
]

์ด์ œ Question/Answering์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

from langchain.chains.question_answering import load_qa_chain

question = "How long was Elizabeth hospitalized?"

chain = load_qa_chain(prompt=prompt, llm=llm)

output = chain({"input_documents": docs, "question": question}, return_only_outputs=True)
print(output)

์ด๋•Œ์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

{'output_text': ' 3 days'}

PDF Summary

langchain-sagemaker-endpoint-pdf-summary.ipynb์—์„œ๋Š” Falcon FM ๊ธฐ๋ฐ˜์˜ SageMaker Endpoint๋กœ PDF Summery๋ฅผ ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ € PyPDF2๋ฅผ ์ด์šฉํ•˜์—ฌ S3์— ์ €์žฅ๋˜์–ด ์žˆ๋Š” PDF ํŒŒ์ผ์„ ์ฝ์–ด์„œ Text๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

import PyPDF2
from io import BytesIO

sess = sagemaker.Session()
s3_bucket = sess.default_bucket()
s3_prefix = 'docs'
s3_file_name = '2016-3series.pdf'   # S3์˜ ํŒŒ์ผ๋ช…


s3r = boto3.resource("s3")
doc = s3r.Object(s3_bucket, s3_prefix+'/'+s3_file_name)
        
contents = doc.get()['Body'].read()
reader = PyPDF2.PdfReader(BytesIO(contents))
        
raw_text = []
for page in reader.pages:
    raw_text.append(page.extract_text())
contents = '\n'.join(raw_text)

new_contents = str(contents).replace("\n"," ")

๋ฌธ์„œ์˜ ํฌ๊ธฐ๊ฐ€ ํฌ๋ฏ€๋กœ RecursiveCharacterTextSplitter๋ฅผ ์ด์šฉํ•ด chunk ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌํ•˜๊ณ  Document์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ load_summarize_chain๋ฅผ ์ด์šฉํ•ด ์š”์•ฝํ•ฉ๋‹ˆ๋‹ค.

from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
texts = text_splitter.split_text(new_contents) 

from langchain.docstore.document import Document
docs = [
    Document(
        page_content=t
    ) for t in texts[:3]
]

from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

prompt_template = """Write a concise summary of the following:


{text}


CONCISE SUMMARY """

PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
summary = chain.run(docs)

chain_type

  • stuff puts all the chunks into one prompt. Thus, this would hit the maximum limit of tokens.
  • map_reduce summarizes each chunk, combines the summary, and summarizes the combined summary. If the combined summary is too large, it would raise error.
  • refine summarizes the first chunk, and then summarizes the second chunk with the first summary. The same process repeats until all chunks are summarized.

ETC

Bedrock์˜ LangChain

from langchain import Bedrock
from langchain.embeddings import BedrockEmbeddings

llm = Bedrock()

print(llm("explain GenAI"))

Reference

LangChain Docs

LangChain - github

SageMaker Endpoint

2-Lab02-RAG-LLM

AWS Kendra Langchain Extensions

QA and Chat over Documents

LangChain - Modules - Language models - LLMs - Integration - SageMakerEndpoint

LangChain - EcoSystem - Integration - SageMaker Endpoint

Ingest knowledge base data t a Vector DB

About

It shows how to use langchain for sagemaker endpoint.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors