Utilizing Document Retrieval to Inform Large Language Models in an Educational Setting

Faculty Mentor

Dr. Bojian Xu

Presentation Type

Poster

Start Date

5-8-2024 11:15 AM

End Date

5-8-2024 1:00 PM

Location

PUB NCR

Primary Discipline of Presentation

Computer Science

Abstract

The advent of Large Language Models, such as ChatGPT, provide many opportunities for students to interact with a computer system using natural language. These models can be tailored or “prompted” to communicate with users in many diverse manners, by utilizing their vast base of compressed knowledge acquired through pretraining on enormous sets of unstructured data to mimic human speech and understanding. One drawback of this approach is that because of the knowledge compression, as well as the knowledge cutoff that hampers knowledge of current events, the large language model may provide overly general or inaccurate (hallucinated) information. One approach to solving this problem is to use Retrieval-augment Generation (RAG), which is a system of storing and retrieving documents relevant to a user’s query to inform the model beforehand as “context” to either give the language model domain-specific information, or current information it may not have access to.

My proposed system utilizes Retrieval-augmented Generation in an educational setting by giving students access to a session-based chat with a large language model, as well as the ability for instructors to upload domain-specific knowledge or supporting documents that would give context for important current topics in a course or supporting documents. Because this system is computerized, students would have access to a tutor with in-depth knowledge of the material 24 hours a day. This system could give students an extra resource to approach and ask questions about the current material or homework assignment, potentially increasing student success. Further work could include instructor supervision of conversations as well as weekly reports of common questions asked to inform the instructor of any gaps in knowledge to cover during lecture.

This document is currently not available here.

Share

COinS
 
May 8th, 11:15 AM May 8th, 1:00 PM

Utilizing Document Retrieval to Inform Large Language Models in an Educational Setting

PUB NCR

The advent of Large Language Models, such as ChatGPT, provide many opportunities for students to interact with a computer system using natural language. These models can be tailored or “prompted” to communicate with users in many diverse manners, by utilizing their vast base of compressed knowledge acquired through pretraining on enormous sets of unstructured data to mimic human speech and understanding. One drawback of this approach is that because of the knowledge compression, as well as the knowledge cutoff that hampers knowledge of current events, the large language model may provide overly general or inaccurate (hallucinated) information. One approach to solving this problem is to use Retrieval-augment Generation (RAG), which is a system of storing and retrieving documents relevant to a user’s query to inform the model beforehand as “context” to either give the language model domain-specific information, or current information it may not have access to.

My proposed system utilizes Retrieval-augmented Generation in an educational setting by giving students access to a session-based chat with a large language model, as well as the ability for instructors to upload domain-specific knowledge or supporting documents that would give context for important current topics in a course or supporting documents. Because this system is computerized, students would have access to a tutor with in-depth knowledge of the material 24 hours a day. This system could give students an extra resource to approach and ask questions about the current material or homework assignment, potentially increasing student success. Further work could include instructor supervision of conversations as well as weekly reports of common questions asked to inform the instructor of any gaps in knowledge to cover during lecture.