Utilizing Document Retrieval to Inform Large Language Models in an Educational Setting
Faculty Mentor
Dr. Bojian Xu
Presentation Type
Poster
Start Date
5-8-2024 11:15 AM
End Date
5-8-2024 1:00 PM
Location
PUB NCR
Primary Discipline of Presentation
Computer Science
Abstract
The advent of Large Language Models, such as ChatGPT, provide many opportunities for students to interact with a computer system using natural language. These models can be tailored or “prompted” to communicate with users in many diverse manners, by utilizing their vast base of compressed knowledge acquired through pretraining on enormous sets of unstructured data to mimic human speech and understanding. One drawback of this approach is that because of the knowledge compression, as well as the knowledge cutoff that hampers knowledge of current events, the large language model may provide overly general or inaccurate (hallucinated) information. One approach to solving this problem is to use Retrieval-augment Generation (RAG), which is a system of storing and retrieving documents relevant to a user’s query to inform the model beforehand as “context” to either give the language model domain-specific information, or current information it may not have access to.
My proposed system utilizes Retrieval-augmented Generation in an educational setting by giving students access to a session-based chat with a large language model, as well as the ability for instructors to upload domain-specific knowledge or supporting documents that would give context for important current topics in a course or supporting documents. Because this system is computerized, students would have access to a tutor with in-depth knowledge of the material 24 hours a day. This system could give students an extra resource to approach and ask questions about the current material or homework assignment, potentially increasing student success. Further work could include instructor supervision of conversations as well as weekly reports of common questions asked to inform the instructor of any gaps in knowledge to cover during lecture.
Recommended Citation
Doner, Douglas J., "Utilizing Document Retrieval to Inform Large Language Models in an Educational Setting" (2024). 2024 Symposium. 6.
https://dc.ewu.edu/srcw_2024/ps_2024/p2_2024/6
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Utilizing Document Retrieval to Inform Large Language Models in an Educational Setting
PUB NCR
The advent of Large Language Models, such as ChatGPT, provide many opportunities for students to interact with a computer system using natural language. These models can be tailored or “prompted” to communicate with users in many diverse manners, by utilizing their vast base of compressed knowledge acquired through pretraining on enormous sets of unstructured data to mimic human speech and understanding. One drawback of this approach is that because of the knowledge compression, as well as the knowledge cutoff that hampers knowledge of current events, the large language model may provide overly general or inaccurate (hallucinated) information. One approach to solving this problem is to use Retrieval-augment Generation (RAG), which is a system of storing and retrieving documents relevant to a user’s query to inform the model beforehand as “context” to either give the language model domain-specific information, or current information it may not have access to.
My proposed system utilizes Retrieval-augmented Generation in an educational setting by giving students access to a session-based chat with a large language model, as well as the ability for instructors to upload domain-specific knowledge or supporting documents that would give context for important current topics in a course or supporting documents. Because this system is computerized, students would have access to a tutor with in-depth knowledge of the material 24 hours a day. This system could give students an extra resource to approach and ask questions about the current material or homework assignment, potentially increasing student success. Further work could include instructor supervision of conversations as well as weekly reports of common questions asked to inform the instructor of any gaps in knowledge to cover during lecture.