Undergraduate AI Researcher at AIEA Lab
#AIEA Lab
The Artificial Intelligence and Explainability in Autonomous (AIEA) Lab at the University of California, Santa Cruz, focuses on enhancing the transparency and reliability of autonomous systems. Led by Assistant Professor Leilani H. Gilpin, the lab's research aims to design methods that enable autonomous systems to explain their decision-making processes.
#Key Projects
Here are the key projects from AIEA Lab:
- Explaining Large Language Model (LLM) Failures: Investigating the causes of hallucinations in LLMs to enhance their reliability and trustworthiness
- Robust Autonomous Vehicles: Creating a framework to detect and explain errors in autonomous vehicles, aiming to improve their safety and reliability.
#Team Members
Leilani H. Gilpin: Assistant Professor leading the lab.
Priyesh Vakharia: Ph.D. student focusing on Explainable AI, Human-Centered AI, and Natural Language Processing. Auther of ProSLM
#ProSLM
ProSLM is a neurosymbolic framework developed to enhance reliability of LLMs in answering questions. Unlike traditional approaches, ProSLM integrates formal logic, utilizing a domain-specific knowledge base and a reasoning system.
It has two key functions:
- context gathering, which generates explainable and relevant context for queries
- validation, which confirms the factual accuracy of outputs.
By linking symbolic reasoning (through Prolog) with LLM capabilities, ProSLM offers both creativity and reliability. This framework operates efficiently without additional LLM training, providing trustworthy and transparent AI responses.
#LLM - Logic Project
Building a system that verifies factual statements through a neural-symbolic process. The user's input is translated into Prolog queries by a neural translator. These queries represent structured assertions, such as the number and names of dining halls. A symbolic component then evaluates each query's truth value against known data. The results determine whether the overall statement is accurate. In this example, one query returns False, resulting in an output indicating the user's statement is incorrect, with specific truth values provided for transparency.
#Converting NL to Prolog
Our first task was to convert Natural Language (NL) to Prolog. The reason behind was we could make goal tree. For example of converting, look on
#What is Prolog
Prolog is a declarative programming language designed for developing logic-based AI applications. Developers can set rules and facts around a problem, and then Prolog's interpreter will use that information to automatically infer solutions.
Here’s an example of using a Prolog-like system in Python for a medical diagnosis:
In this example, the system identifies symptoms and uses logical inference (backward chaining) to assess the likelihood of a flu diagnosis.
#Converting from NL
To convert Natural Language (NL) to Prolog, first review existing methods and datasets, leveraging models like GPT-4 and Gemini to create 100 example pairs for NL-to-Prolog translations. Evaluate resources (e.g., Simpson’s KB) for existing data, expanding with BLEU and ROUGE scores to assess translation accuracy. Manually verify 10 samples for quality, addressing syntax issues and establishing reliable validation methods. If using a Knowledge Base (KB), determine model compatibility for KB-specific Prolog outputs.
For task completion, criteria include dataset creation with validated examples, organized in shared documents for consistency. Implement a framework for ongoing dataset testing and Prolog accuracy validation. Compile resources, such as a spreadsheet and links to relevant guides (e.g., RAG for Retrieval-Augmented Generation), supporting data organization and evaluation.
gpt4 (100 example demo dataset)
gemini-1.5-pro-001 (100 example demo dataset)
#Evaluation dataset
To create an evaluation dataset, we first scraped the relevant data and converted it into Prolog code, forming the foundation of our analysis and quality assurance framework.
The table below shows selected sample questions from UC Santa Cruz's Baskin Engineering FAQ, with official answers, AI-generated responses, and evaluations. Each question entry includes a G Eval Score to quantify response accuracy and a G Eval Reason to explain each score. This information highlights alignment with expected answers or any notable omissions, such as missing procedural steps.
Index | Questions | Answers | Simplified Answers | Source Name | Source |
|---|---|---|---|---|---|
| 12 | How do I get my UC SHIP ID card? | "Insurance ID cards are electronic. Download the Sydney Health app to access your ID card; find Student Health Center location, hours and services; view medical, pharmacy, dental and vision coverage and claims information; and get notifications for benefit changes and action items. For a hard copy, contact UC SHIP Member Services at 1-866-940-8306." | Insurance ID card is electronic, available via Sydney Health app. For a hard copy, contact UC SHIP Member Services at 1-866-940-8306. | UCSC Student Health Center FAQs | https://healthcenter.ucsc.edu/billing-insurance/101.html |
| 18 | Can I do research at Baskin Engineering? | Yes. Many professors encourage undergraduate students to conduct research and provide various opportunities throughout the year. Some research positions are paid and/or can be completed for course credit. | Yes, many professors encourage undergraduates to conduct research, with opportunities for pay or course credit. | Baskin Engineering Student Ambassadors FAQs | https://engineering.ucsc.edu/besa/faq/ |
| 21 | What are the class sizes for the lower and upper division classes under the engineering school? | Annual undergraduate class size by division and department can be viewed on the Institutional Research, Assessment, and Policy Studies website. | Annual undergraduate class size by division and department can be viewed on the Institutional Research, Assessment, and Policy Studies website. | Baskin Engineering Student Ambassadors FAQs | https://engineering.ucsc.edu/besa/faq/ |
Using this data, we developed a Retrieval-Augmented Generation (RAG) Question Answering (QA) dataset, structured to include both original and generated responses to common questions about UC Santa Cruz's Baskin Engineering School. For each question, the dataset includes the official answer, a generated response, and an evaluation score and reason. This setup allows us to assess the quality and completeness of AI-generated responses against real user needs.
Index | Questions | Answers | Generated Response | G Eval Score | G Eval Reason | Source Name | Source | Source Docs |
|---|---|---|---|---|---|---|---|---|
| 22 | What is the diversity breakdown for engineering? | The university creates charts detailing undergraduate major count by gender and race/ethnicity every academic year. That data can be viewed on the Institutional Research, Assessment, and Policy Studies website. | To find the diversity breakdown for engineering at UC Santa Cruz, you can refer to the student statistics on the Majors page provided by the Institutional Research, Analytics, and Planning Support (IRAPS) at this link: https://iraps.ucsc.edu/student-statistics/majors.html. This page includes detailed information on the demographics of students in engineering, including breakdowns by race/ethnicity and gender. | 0.5900305399 | The actual output provides a relevant link to obtain detailed information on the diversity breakdown for engineering, which aligns with the expected output. However, it does not directly provide the diversity breakdown within the response itself, resulting in a slight omission of detail. | Baskin Engineering Student Ambassadors FAQs | https://engineering.ucsc.edu/besa/faq/ | "To answer the question about the diversity breakdown for engineering at UC Santa Cruz, you can refer to the student statistics on the Majors page: https://iraps.ucsc.edu/student-statistics/majors.html provided by the Institutional Research, Analytics, and Planning Support (IRAPS). This page includes detailed information on undergraduate and graduate majors, including breakdowns by race/ethnicity and gender. Specific data for engineering can be found by exploring the relevant sections under undergraduate and graduate majors by race/ethnicity and gender. To understand which majors are housed in the School of Engineering at UC Santa Cruz, refer to the Degree Programs page: https://engineering.ucsc.edu/academics/degree-programs/ provided by the Baskin School of Engineering." |
| 26 | Can I switch into computer science if I was accepted into another engineering program? | The UC Santa Cruz computer science major is currently impacted. This means that incoming freshmen must be admitted as a proposed computer science (B.A or B.S.) student to be eligible to later declare a computer science major, and transfer students must be admitted to the major at the time of admission to UCSC. The computer science minor is an option for students interested in pursuing computer science programming. | No, you cannot switch into Computer Science if you were accepted into another engineering program. To pursue either the B.A. or B.S. in Computer Science at UCSC, you must have been admitted as a proposed Computer Science Major. | 0.8634618716 | The actual output accurately states that a switch to Computer Science is not possible if not initially admitted as a Computer Science major, aligning with the expected output. However, it omits details about the specific steps or policies involved in the process. | Baskin Engineering Student Ambassadors FAQs | https://engineering.ucsc.edu/besa/faq/ | "Admission to the Computer Science majors (B.A. and B.S.) is selective. The Computer Science majors at UCSC are impacted and in order to pursue either of these majors, students must have been admitted to UCSC as a proposed Computer Science Major. Students not admitted as proposed Computer Science will not be able to pursue either of these majors. https://undergrad.engineering.ucsc.edu/advising/policies-forms-petitions/proposed-major-standing/ To understand which majors are housed in the School of Engineering at UC Santa Cruz, refer to the Degree Programs page: https://engineering.ucsc.edu/academics/degree-programs/ provided by the Baskin School of Engineering. Options for Students Not Offered Admission Transferring to UCSC Dual Admission Transfer Admission Guarantee (TAG) Options for Students Not Offered Admission UC Santa Cruz is a selective campus, and each year many excellent students are not offered admission because of capacity limits or additional preparation needed in certain areas. We understand your disappointment, but if attaining a UCSC degree is still your goal, we’d like to offer some alternative pathways to get you on your way toward achieving your dream." |
The table above illustrates sample questions and responses, including scores and explanations of any omissions or misalignments. The G Eval Score (e.g., 0.59 or 0.86) quantifies the response accuracy, while G Eval Reason provides reasoning for each score, pointing out alignment with expectations or areas of improvement, such as missing procedural details.
#Evaluation Score
Each entry in the dataset is scored based on its accuracy and completeness, using a G Eval Score to quantify the alignment with expected responses and a G Eval Reason to explain any misalignments or omissions, like missing procedural details.
G-Eval
This evaluation approach allows us to assess the quality of responses by identifying areas for improvement and evaluating the system's ability to generate accurate, contextually relevant answers. The score highlights if a generated response missed important details or aligned closely with the expected answer. Also resources like BLEU and ROUGE scores are used to measure translation accuracy in converting NL to Prolog for logic-based AI systems.
To further streamline quality assessment, we leverage tools like RAGAS and G-Eval for automated scoring and human evaluations to verify dataset accuracy, ensuring that AI-generated responses are effectively tailored to meet user needs in education and support contexts.
RAGAS