Ethics and Evidence: AI Tools in Clinical Education and Practice

Speaker:

Richard Truxillo, DO
Assistant Professor, Family and Community Medicine, VTCSOM

Objectives

Upon completion of this activity, participants will be able to:

Differentiate between evidence-based AI tools (e.g., OpenEvidence, UpToDate) and general-purpose large language models (e.g., ChatGPT) in terms of data sources, reliability, and ethical considerations.
Identify key ethical challenges related to AI use in clinical education — including bias, transparency, accountability, and data privacy.
Apply governance and teaching strategies to responsibly integrate AI tools into health professional education while maintaining academic and clinical integrity.

Invitees

All interested Carilion Clinic, VTC, and RUC physicians, faculty, and other health professions educators.

PowerPoint

CME Evaluation

*The Medical Society of Virginia is a member of the Southern States CME Collaborative, an ACCME Recognized Accreditor.
This activity has been planned and implemented in accordance with the accreditation requirements and policies of the Southern States CME Collaborative (SSCC) through the joint providership of Carilion Clinic's CME Program and Carilion Clinic Office of Continuing Professional Development. Carilion Clinic's CME Program is accredited by the SSCC to provide continuing medical education for physicians. Carilion Clinic's CME Program designates this live activity for a maximum of 1 AMA PRA Category 1 Credit^TM.
Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Well, hello everyone. Uh, Rich Trxello here. Um, looking over, I see some familiar faces. I appreciate you coming and and uh, spending some time with me over lunch today, and I hope I can make it worth your while. Um my main audience is healthcare educators but I see some others who uh you know don't directly have access to learners. So I hope you get something out of this as well. Uh and really the the learning objectives today are to differentiate between evidence-based AI tools and these general large language models like chat GPT or Grock AI or some of the things that uh you may be using or see others using. Uh but I want to talk about like these ethical challenges that we have. We have bias, transparency, accountability, privacy issues that have come up as this technology has just been you know rapidly advancing like a freight train out of control. uh but we want to apply strategies for the the ethical application of AI particularly in regards to our teaching of uh new clinicians, new nurses, new APS, new physicians because that's very important thing. We don't we are not going to be teaching the same way uh that we were 10 years ago. So some disclaimers first. I have no vested financial interest in any of the uh products that you see here. I'm a heavy user of artificial intelligence and large language models. Now, uh I've even developed a couple of my own models here at home. Uh I use some aenic AI for personal and business use. Um I also use it as a content creator outside of medicine. So I know uh other applications of it outside of medicine and how it can be used well and how it can be used for maleficent reasons as well. Um, and I've been heavily involved in computing since the age of 11. In fact, here are some old pictures of me when I worked at Corning in the as a uh a database engineer. And uh you can see me uh writing my my first music back then when I was 18 years old um with my my Billy Joel um Billy Joel sheet music there on my little key Yamaha keyboard. um my views on AI, it's exciting, it's energizing, it's a wonderful application of technology that's finally all come together. For me, this concept of beneficence is the number one priority for me. There should be no patient harm. We shouldn't do harm to humanity. Uh we should use it and leverage it to focus on reducing workload for less complex tasks, but we should also use it in a way that helps us with the complex tasks that we do. Let's take some of the cognitive burden and cognitive load off of our clinical tasks that we do every day. Like I would love for AI to be able to automatically fill out a prior authorization form for me instead of having to pick up the phone and call someone and have a 15minute argument. U but also as an educator it can be another useful tool in our aramentarium for doing a variety of things. We'll talk about that. Uh but the one thing I I do want to bring up is this is not a complete replacement for human thought. I still think the human brain is just a a miracle to me and very complex. And I don't think although computers have access to a lot of information, it still takes that that human spin, that human ingenuity to power these things. And there's also some concerns, some environmental, I'll call them challenges or impacts that I collectively feel humanity will rise to to meet these challenges and overcome them. Uh but I wanted you to have my my own personal views because the views are all over the place when it comes to artificial intelligence. So I'm going to stop here. I'm looking looking over here at my monitor. I'm looking over here at you all. And uh just from a raise of hands if you if you don't mind I just want to feel you know who who here feels like they're a novice to using large language models like chat GPT don't know much about it just know that it it's out there. So I see Mike raising their hand Phyllis raising her hand. Any anyone else? All right. Fantastic. How many people are using it a little bit? Maybe you're using open evidence or, you know, I know I know I know a lot of clinicians have gotten on the o open evidence. I know the internal medicine department's really excited about open evidence because I've I've talked to Johnuet and him and I have had some fascinating conversations about that as I developed my own um uh custom GPT for clinical decision support. All right. All right. And then how many people are experts here with a raise of hands? If raise your hand if you think that you're a relative expert. Sherry's still got her ra her hand raised, so I imagine she so she's going to school me on all this. I'm just I'm picking on you. Um No. Yes. uh you know Taylor I can tell you Taylor Richardson could give me a run for my money I bet. Um so let me let me move on to uh one point first. So I want to talk about the power consumption of AI uh mentioning that environmental impact because I think this kind of ties into ethics. So the the very early applications of AI they were very inefficient. Our compute wasn't very high. Um, we were using something called CUDA cores in Nvidia graphics processing units which are mainly in gaming graphics cards and computers. They require a heavy power load just for simple queries, but our compute has really come a long way. Uh, it's gotten more efficient, particularly in late 2024 and now this year. Uh, so the the power consumption, you'll see the the actual numbers I've been able to pull, has gotten to be a lot better. However, we I still feel like we need sources of renewable clean energy to help handle this load because we're using it more and more. Even though we're getting more efficient, the technology use is getting um more I mean, it's embedding itself in society. Uh maybe we could use solar energy. However, this technology lags behind. it's not as efficient enough to keep up with power demands. So you have to build it on a large scale. I think AI itself can help us solve some of these efficiency issues and our energy issues. So it could help engineers develop more efficient uh power generation. And then the other thing is uh a lot of these you know server farms that are used for AI um require a lot of cooling that has a power cost as well. So it definitely runs hot and uh someone always someone asked a long time ago um and it and it made me think about carbon emissions you know so I tried to bring break this down into carbon emissions and where it's come but if we look at these large language models so if we compare you know chat GPT gemini claude which is mainly used for coding um you know llama which is used for general general use things and then Grock which is uh just like a melting pot of everything. Um if you compare one W hour to being able to power a light bulb for a few seconds, what you can see is over time the power consumption has come down quite a bit. But the question we have to ask ourselves is even though the carbon footprint has definitely gotten better um and the energy usage has gotten better on a queer beer basis, we are making billions of queries a day. So we still are going to have an issue particularly in the United States because our power grid really needs to be um reinforced to some degree of uh of power consumption. How are we going to power these uh these um autonomous AI agents? So, I wanted to throw that out there just just to give you some some thought, but the AI tools that clinicians use mainly and I've I've pulled this tried to pull as much data as as possible. Um it's really up-to-date, Dynamed, Open Evidence, Chat, GPT, and Microsoft Copilot. Those are the the main ones. And so the the the curated contents reference engines are upto-date and Dynamed. Uh so most of us here who are physicians uh or APS we're used to using we we we kind of grew up on up-to-date and dynamed right we would uh we would we would look through article if we didn't have time to read through a specific journal or if we couldn't remember a reference we would want to read a cur author curated you know um I call it a plate but an article on rheumatoid arthritis or acute pancreatitis itis and what the current evidence says we should do for this and how it's diagnosed and the pathophysiology and all of that. And then we have a newer tool open evidence which is kind of deeper. It's more evidence-based AI. So instead of like these editorial um you know type uh repository or database we have access to uh to JAMAMA and access to the New England Journal of Medicine which if you're not aware of um if you're a a subscriber to like JAMA uh you can get those articles you know immediately but if you're not a subscriber it's not really in public domain for 12 months. But I'm going to explain to you why that may not necessarily be a problem because usually other people will report and give their take on medical literature that comes out in these um large language models like chat GPT and gro and all that pick that up and ingest it into their model. And then we have the general purpose large language models. We have chat GPT, Microsoft C-Pilot, Grock. Those are the things that consumers, people who are, you know, generally noviceses are at least familiar with. Hey, I know what these do. Um, I can ask it a question and it'll give me an answer. So, let me just dive into just a brief overview of all of these. So, this is what open evidence looks like if you've never used it. It's it's free to use. A lot of people have already picked up on it, but it's basically a medical AI search and summarization tool based on evidence and they've actually have contracts with JAMAMA in the New England Journal of Medicine in the Lancet and other you know highprofile wellthought of peer-reviewed um journals. So you know that you're getting good evidence and upto-date evidence when you use this tool. But it's focused only on medicine applications. So you ask it a medical question, you're going to get a medical answer, but you can't really ask it general knowledge questions. Uh it's really meant to be uh real time ingestion and reasoning capabilities for medical issues. And so I have a query here just as example. Um my wife had cubital tunnel release. I wanted to to get um you know upto-date um you know I wanted to get up-to-date evidence on what the outcomes were, what we needed to watch out for and uh and kind of use that. And so here here is my query on that. And it's still very uh very much a narrative type of response, but you'll notice that it cites its sources. In in medicine uh particularly physicians we like to cite where our information is coming from. We like to know certain landmark trials. We like to know that the medicine you know the medical advice that we get is accurate and based on grounded science. So open evidence has a you know a a newer kind of large advantage in that regard in in my opinion and it's growing. And then we have up-to-date. And uh up-to-date's kind of like been our trusted clinical reference resource. It's expert editorial reviews. Um it's not updated as fast as open evidence because everything is editorial and reviewed by peers. Um but the one thing it has is CME credit, which CME credit is great. Um, and they recently reached out to me and the physician builder team is testing their new AI search engine within their own ecosystem. So, uh, to give you an example, uh, it tell me recent evidence on cubital tunnel relief surgery. And this is what I got. Um, I didn't really get a a nice summary, but the search engine did bring up relevant articles I could read. And buried in those articles, you know, if you look at the bottom, you'll see er neuropathy at the elbow and the wrist. I can see surgery for refractory or severe symptoms. I could read that like I'm used to. Um, but I'm going to be completely honest and transparent with you. The reason why I love up to date are the CME credits. Um, you know, I I still use up-to-date um particularly to to gather CME credits. I still like that editorialized content. And uh as a physician sometimes I just want to read an article and understand it. Uh just I don't know I I still have that that little bit of old school in me as uh as uh Dr. Damat and uh when he was here uh Dick Wardrop would say. Um and then there's chat GPT in particular chat GPT5 in clinical context. So this is a large language model with a broad knowledge base. are talking tons and tons of information like you know they've ingested like most of the information on the internet. There's no built-in evidence retrieval or citation validation with it, but it will site sources if you ask it to. Uh, but it has a very heavy reliance on web-based and social media sources and that includes risks and that risk is hallucination and misinformation. And for those of you who are noviceses at um at this hallucination is when the large language model just makes something up. So for example, some of the models that I'm testing in Epic uh in their early infancy made up the name of a doctor that did not exist in our system. made up the name of a neurologist who did not exist. And so as an informaticist, that's one of the biggest challenges is making sure these AI large language models do not make up things and give you bad information because that is in my opinion the ultimate risk to patient care. We need to guard about that. So, one of my takeaways from this is when you're using large language models, always trust but verify. Think to yourself, could this be a hallucination if it doesn't make sense? How have I combed that? Well, in full disclosure, I'm developing my own custom GPT, trucks GPT, and uh I've actually diagnosed. So, my my son uh I don't know if does anyone here know what West syndrome is or infantile spasms? It's pretty rare, but uh I was able to diagnose my son with it. Um I just happened I was at a conference. I was at Virginia Hymns and uh he started having these weird ticks right after I I I gave my panel uh my panel talk and didn't think up much of it until I got him home from Williamsburg and I I got a video of his full you know it was almost like a repetitive epileptic activity but it was different than your normal uh focal seizure activity. So, I took a video of it, uploaded it to trucks GPT, and it nailed the diagnosis that I need to take him to the ER. I called my PCP just to, you know, say, "Hey, I normally don't do this after hours, but I think I need to take my child to the ED. This is what's going on. Here's the video." And sure enough, it was right on. I even uploaded the um the EEG at 3:00 am and it diagnosed it completely, gave me full treatment recommendations and 12 hours later the neurologist came in and gave the exact summary that my model did. The difference is my model has um confidence intervals in it. It's really meant for um point of care. I need to know what to do next. I'm three levels deep in my clinical algorithm and I need to know what to do next with with the best algorithm um with the best evidence available. So, it's it's pointed at using, you know, our best medical journals um citing evidence. It will look at other web sources, but it'll assign at a lower confidence interval for full disclosure and transparency. But this is something that I'm working on and I just wanted to show this as uh as the the possibility of you to dream about what artificial intelligence could potentially do, right? Um is it perfect? No. Is it a substitute for uh you know for your own clinical decision- making? No. It says it in my disclaimer right there. You have to have a good foundation as a clinician first to be able to use these tools. But remember, we have over 1,600 new articles coming out every day. It's impossible for us to read those all. So having curated information at our fingertips is what's going to help us as physicians and clinicians take care of our patients. And that's the ultimate goal is to take care of them and take care of them safely. Now, I'm going to go about uh Microsoft Copilot. So, if how many of you use Microsoft tools in your dayto-day work? I'd imagine most of you do, right? We're kind of forced into Microsoft Office 365 suite. You know, you're watching a PowerPoint now. Uh but Microsoft C-Pilot is their LLM embedded in their Microsoft Office tools. And as a teacher, it's meant to augment your productivity with teaching, right? So it could be limited use as a clinical tool, but it could be useful to develop the content rapidly, but you have to give some oversight. You can't just leave it up to chance. And so, for example, you know, I asked for three PowerPoint slides describing the, you know, organels in a cell with graphics. Why? because my my daughter was working on a science project and she needed to know what organels in the cell were. And so uh I I took my query here and uh it gave me a graphic. It cited where I could get that graphic from so I could assign attribution to it. Gave me the content layout for the slides and did everything except make the slide for me. But if I asked it to, it could give me a design template. So, if you're not familiar with this, for the noviceses uh who are here, uh in the upper right hand corner of all of your Microsoft tools, you'll see this co-pilot button and you can ask Copilot to summarize a document. You can add speaker notes for a slide. You can develop quiz questions from your slide. Uh you know, you can have it help you develop content. But the key here is you need to make sure that you're providing appropriate oversight. So in comparison, you've got open evidence which is domain specific for medicine and it cites specific in my opinion very high quality evidence-based sources. And as uh Dr. Damont said, open evidence does have CME credit as well. Um you have to there is a drop down you have to to find it but it it it does um I have not done that yet but I need to do that uh because I got to update my AFP and ABPM um subsp specialty stuff. Um maybe I can find a a Gentic AI to to help me keep keep that automatically. Um up todate still is our curated expert reviewed um place for articles. I still like it. I still use it. Uh, it is limited in focus because it's not updated as often. So, you're not always going to get the most um, pardon the uh, this is horrible, most up-to-date information. Um, gosh, I feel bad just saying that. Um, but they're they're working on it. They're it's it's still valuable to me. It still has value. Um, Chad GPT, it's conversational. It'll have a conversation. It lacks that clinical grade validation, but it's rapidly gaining traction and customized GPTs like mine can point it to use more medical evidence if we give it the appropriate um so-called programming or prompt engineering in the background. And then I'm going to mention Grock. Grock is controversial because it's conversational but it lacks clinical grade validation but they have a huge amount of compute and they are ingesting anything and everything now that needs refinement um for clinical use. I would never use it for clinical use but I do experiment with it and it scarily gets better every day. Uh so I'm just including this in the conversation that this may emerge um with as an offshoot as a potential um competitor with these other large language models just because of the size of the compute that it has compared to a lot of other systems. So let's talk about some recent comparative studies. You know we got to have some evidence here. So open evidence versus chat GPT performance on the USMLE, right? Uh on average, open evidence achieved higher evidence relevance. So 24% versus 2 to 10% in a recent assessment. And then in a controlled test with vendor provided numbers, not peer-reviewed, um they stated that it scored 100% on a mock USMOLE. Um I usually take any type of vendor numbers with a grain of salt but that is impressive. And then in 2024 on a kind of more um more not completely peer-reviewed but more um honest assessment I would say chat GBTC achieved 86% on the USMLE in 2024. Uh I want to make a point here. You know everyone's really excited about exam performance right? Uh people can memorize a question bank, but practicing medicine is an art and a science. It's not the same as real world clinical safety. It's not the same as educational context. It, you know, it does a pretty good job at, you know, creating empathic scenarios, but there's nothing like actually practicing medicine to teach you medicine. I don't think it's going to replace our brains anytime soon, but it will help us recall information and I think USML does a good job of assessing a learner's ability to recall important information to ensure that they have a foundation to be able to use tools like these and to be able to continue their medical education. So let's talk about the different ethical dimensions of AI. So there's accountability and authorship. Um there's transparency, explanability, and then there's this whole thing of bias, privacy, and consent. And the first thing we have to talk about is how are these large language models trained, right? Because where you get your data is very important because that is how the large language models are going to respond. A lot of this is based on mathematical equations. You have to keep that in mind. It's looking for the next best answer almost mathematically speaking. So chat GPT uh gets a mixture of licensed data um but they go through it with you know reinforced learning from human feedback. So they don't just ingest every data set and then leave it all to chance and let the the AI agents figure it out um autonomously. What they're doing is they're having some element of human review and tweaking it. And then we have Microsoft Code pilot that's public code. You can actually go to what's called a GitHub repository which is open source meaning that anyone can look at the programming code behind it if you want to. Uh so it's publicly available. Um user code is not used to train the model. Only public code is used for core training. So that has some power to it and some transparency built in which is something that is uh refreshing to see. Uh, Grock is ingesting everything from the web. Um, including from posts from everywhere from Blue Sky to X to Facebook to Instagram. Um, and uh, I feel like it has less curation, but what it does have over the advantage it does have, it uses huge compute. I'm sure if you've watched uh any of the articles in the tech industry, we've heard about the NVIDIA GPU Colossus Supercluster, which has 200,000 GPUs that are just working around the clock for advanced training. Um it it crunches text, images, videos. uh they use it for general purpose uh you know ideas right now, not much for medicine, but I feel like in the next year or so that may change as they seek out more uh business cases uh to use all this training data. Then we have open evidence. It's got 35 million peer-reviewed medical publications. It goes back a long way too. uh but they've trained their model tailored for clinical and medical decision support. It's a pros type of output. But the point here is that open evidence instead of going after this broad general purpose engine, it is laser focused on accuracy and transparency and taking care of people in the medical domain. And I think that has a lot of power to it. Uh, so personally, if I'm going to be looking into someone's care, I you I like open evidence. That's my that's my personal bias. Uh, I feel some of you probably feel the same way. Get a hands up if I have others here who feel the same way that open evidence is like your go-to. Sorry, I keep glancing over at you all. I just want to keep this somewhat interactive. So, so Paul, Chad, yeah. Um, Dedra. Yeah. I mean, um, it it it's I I love it. It's impressive, um, the, uh, the the level of detail we're able to get out of, you know, that particular LLM. So, let's talk about human bias. Um, there's data collection bias. So, you look at your training data sets. Some of your training data sets can over represent demographics, languages, cultures, things you wouldn't normally think about. you think of, you know, think of uploading a huge Excel spreadsheet that's, you know, millions of rows long, right? You know, you're not necessarily thinking about that. But, you know, a great example of this that was uh I've read about and putting this together was mo a lot of a lot of the stuff we're ingesting has an English language indominance. So, internet text skews towards English language. And so the cultural perspective if you're using an LLM to ask about it can be influenced by that. Um you can label right? So uh we we've seen we've all seen uh as these large language models will ingest you know social media posts or you have a human who is curating what comes in. They can apply subjective judgments to the output or classification of certain um certain output in their model, right? They can flavor it a certain way. They you know in politics you can make something more liberal, more conservative. In you know medicine, you can make something laser focused on a few journals or you can go to wrongdiagnosis.com if you choose and display data from that. I'm using very extreme examples here but you can um you know you can label or classify things. We call that labeling bias and then there's historical bias. So if you look at real world data sets you know there's there are some past inequalities in there. So for example, hiring data, you know, may show gender gaps in it and you know, we need we just need to be aware of that and ask the question why is that? Um if it affects our output and then confirmation bias and curation. So um some developers may s you know try to sell their LLM to a certain group. So they're going to select sources that will match the expectations or values of their customer base, right? Um so the whole point is it's up to us to use these AI tools responsibly. Keep the human element in your content creation. When you ask it questions, you you know kind of trust but verify when you see the output. You know, ask yourself, does this make sense in what I asked it? I asked it a question. did the answer that came back does that make sense? And then just take a moment and say, are there biases that could be present in this result? Most of the time there's not going to be. It's going to be a pretty dry answer with, you know, input. But if you're if you're looking into something that, you know, may be charged or may be biased, ask yourself the question, does this does the bias is it present first? And two, does it matter in the context of the question that I asked or the result that I need? And then finally, are the cited sources reputable? I can't tell you how many times I've used chat GPT and it, you know, and tried to test asking a medical question and it came up with a Reddit or a blue sky post from the internet as the source. Um, can't we can't do that. Um, in medicine we aspire to have peer-reviewed, you know, randomized controlled studies, you know, robust metaanalyses. It's kind of like our gold standard, right? Um, you have to be very careful uh when looking at sources and making sure that it's pulling from reputable sources. Uh sometimes it's okay if it pulls from like Mayo Clinic or you know um you know Cleveland Clinic or Duke you know some of that is curated content right and there a lot of them are edit it's editorial content based upon um you know evidence-based medicine but again it's got a human element to it so you do have to at least say is this biased where my where my information coming from what does that mean to me in the context of the clinical question I'm trying to ask or the question that I'm trying to ask to teach someone something else. See, let's look over here at the questions. Good. And look at my time. All right. So, over reliance and skill atrophy. This is a huge risk as we develop our ability to use AI more. I have this fear that it may erode our critical reasoning in our learners. Now, um I've spent some time with the VTC students and they're pretty sharp. Um they're very optimistic. I love it. I have a I I started with my new lace student on Wednesday and love them already. Just I you know there's nothing you know if anyone out there who doesn't have a lace student yet and and and you want to restore your faith in medicine, take on a furtzier medical student and see patients with them. just I just mind-blowingly how much it res restores my faith in in the f the the generation to come. Um we as teachers need to promote independent verification and reflection, right? We need to stress the importance of these citations. We need to monitor for plagiarization, but we also need to know that these new large language models are meant to be used um and they're going to obuscate each other, right? There's going to be uh potential to overrely on them. So, we see this most rampant, you know, like I I've heard horror stories from Virginia Tech professors, you know, where they're having to use things uh to use other tools to monitor for LLM output because people are just writing their essay with a prompt, right? I could I could ask it for a 500word essay on Thomas Jefferson and it would give it. But I'm going to show you a couple things that can help you identify plagiarism. All right. One of them is a tool called Copy Leaks. What it does is it's uh it's looking for content integrity. It's looking for patterns. Um it has done, you know, millions of scans and has a relatively good accuracy. uh what it's not really good at doing is people have got caught on and they're they're writing their essay with it and they're paraphrasing and they're rewriting the summary in their own words. Um and so the accuracy of this can vary if your learner does that. But for people who are just, you know, phoning it in and typing a query, copy leaks will absolutely pick up on that. There's also a tool called GPT0, which I love that name by the way. That's that's great marketing. Um, but it uses u, you know, some metrics like perplexity or burstiness because the AI, you know, especially chat GPT, u, if you look at it, it gives output in a certain tone, right? You can you can tell when something comes from GPT, when an email has been dressed up with GPT, they use certain words, it's overly empathic in its response. Um, you know, for for those of us who sometimes, you know, give a a TUR oneliner, you're not going to see that with GPT unless you ask for it, right? So, while this is helpful, there are plenty of documented concerns about false positive in this tool. Uh so it should be used part of of a a broader process not by itself. Um and then there's LLM debt which is open source. So again that GitHub open source model you can look at the code and see how it's uh how it's uh doing. This is not really used in the university or medical school setting. This is really looking for um you know falsified data falsified data sets in research uh falsified patterns in engineering. It's really more for technical institutional use rather than a cursory classroom check. You're going to upload a large data set to this and have it uh go through line by line and look at distributions of data and see if there's falsified data and how it's classified. But I just wanted you all to be aware that there are tools to combat plagiarism out there. So what is my summary? This this is my take on this. It's okay to use these tools with the following caveats. In my opinion, look for patterns of consistent plagiarism in a learner. Don't judge based on a single positive because uh look at the student as a whole. Are they learning? Do they show evidence of learning? because there are false positives in these large language models and have candid discussions ahead of time with learners about the use of AI tools. Tell them that, you know, it puts them at risk for skill atrophy. If you never practice your skills, if you never try to memorize something, especially the basics, then you're never going to be able to understand the nuances of the output of these AI tools. And then when in doubt, I'm a big fan of asking a student to verbally validate their mastery of the material. Right? There's nothing like asking why do you think that? Why do you think this patient has this? What is your differential diagnosis here? Why did you why did you think that? Did you remember to ask this? So, there's still power in verbally validating their mastery of the material. And I'm a big believer in that. All of our learners should have a baseline knowledge that they can just recall because AI is a is a research assistant, but it's not the absolute authority. We want to promote proper questioning, prompt hygiene. That's what that is. So asking proper questions and critical thinking based on that knowledge base. We can use it to brainstorm. We can add our own input. We can spin it. But that's gonna be based on our baseline expertise that we have. We're not going to replace it. So that brings me to my next point is designing your assessments with AI. And I just ask you to really do three things. Compare. So open evidence versus chat GPT outputs. What's going to be better for your use case? You're looking at a patient and you and you need an answer on a patient. Open evidence. You know, if you're maybe designing some modules, co-pilot or chat GPT, you should require citation verification. Where is this information coming from? Does this pass the sniff test of being right or did I get this from some herbal herbal uh remedy website? True story there. Um, and you should include AI use disclosure policies. I know we just rolled out a dragon uh DAX co-pilot which is our AI scribe. You know, we should disclose to our patients that we're using AI scribe. Um we should, you know, let people know when we use AI tools. I really didn't use AI much for this presentation except to uh pretty up my slides so they had a a theme um and also to kind of shore up some of my cat memes in here. Um, but I could have very much have written all of this with AI and you may not have, you know, you would never be word to the wiser unless you were versed in that. So I think transparency is important. That brings me to governance and oversight and really when you look at the the data on this which is is still uh really new, we need to create these AI use case registries, ethics groups and audit reviews of our large language models that we use, right? So a registry what is a AI use case registry? Well this is a busy slide but basically the registry is an institution single source of true for tracking LLMs. So where is your AI being used? Why does the use case for it exist? What is its benefit to our organization? How it operates? What d where's the data coming from? Who's the model? Who's the vendor? Who owns the data? Right. And who manages it? Who's responsible when something goes wrong and it misdiagnoses? Who who gives approval on what's uh allowed at our institution and what's not? My friends, that takes a group. That takes a a bunch bunch of humans getting together and having a candid conversation about what is what is right for our institution. Did I skip a slide? No, I didn't. Okay, vendor evaluation. So when you look at a vendor, so you have all these, you know, I can't tell you how many times a a vendor comes to, you know, a chair of a department, it's like, I'm going to solve all your problems, right? And the first thing that goes through my mind is, where are your where are your sources of data coming from? How are you training this? Are you going to keep our data safe? We're going to give you patient data. That concerns me, right? Because we all know what happened with Change Healthcare. They leaked all sorts of stuff. um and they're still recovering from that. And then there's HIPPA and FURPA compliance for healthcare institutions. So reminder, HIPPA and FURPA uh means that we are not going to share patient data, nor are we going to share student data with outside parties. They have a right to privacy. And so we need to make sure that when we transmit this data to these uh large language models that we are preserving the privacy of our patients and our students and all learners. Um you know monitor for bias and equity which is really hard. uh sometimes you have to use a LLM to use it but you know I feel we should uh look for responses and dem you know and look for demographic difference patterns in them uh just ask just ask the question just pause it doesn't have to be a big deal just you just pause and say is there a bias here and it should be fair and and inclusive of all all data so for example it should include all the students um we should include um you know if we're rolling out a clinical model like it should be it should be useful to all the physicians in the group. It should be useful to all the nurses if it's a nursing model. Um it should be uh it should you know it should grade everyone fairly not based on demographic data but based on um you know a set rubric of of uh grading criteria. So how can you integrate this into your education? Right? So use it to generate case summaries, simulations. So you can come up with some great p simulated patient cases with this great uh generated questions for multiple choice tests. Um you always got to, you know, do some human review. Make sure it's accurate. Um and make sure your sources are legitimate. Again, these large language models, uh make sure a cited source actually exists. It's not from some blue sky Facebook X or Reddit post. Um, try to rely on peerreview publications when you're teaching your students or residents when possible. And remember, there's a legal and institutional risk, and we don't know the depths of this yet. You know, the there I can't even imagine what the legal risks are of um AI generated errors resulting in patient harm, but that does exist. And ultimately written on my glassboard over here, the patient comes first and I feel the buck stops with me. If I'm using the AI tool, I'm responsible for its output because I'm using it. With that said, there's a need for legal compliance oversight. So when you go back to that AI ethics committee in that use case group, should probably have someone from legal compliance embedded on it, right? To get to get their um their input. uh especially in our organization, we we don't want to put the organization at risk for uh producing um bad outcomes. Uh we want to make sure that we have patient facing consent for AI models like DAX co-pilot. And just realize this is uncharted territory for medicine and it's changing every day. And it's true that you have people like myself and Dr. Speaker and Trip Humphrey and uh a bunch of others who are standing in the gap and trying to keep up with all of this. Um but it's it's going to take um a joint effort and it's going to take all of us being mindful uh of these tools as they uh really become commonplace everywhere. Privacy and data protection real quick just avoid um entering PHI so patient facing information unless you absolutely necessary just deidentify it. Don't put names, don't put medical record numbers, you know, try to rely on deidentified data, but put your case specifics in to ask a clinical query. Um, and when you're using your vendor data, um, you know, when you're using vendor data from outside sources, ask them about the retention process. You know, how long do they retain data? I can tell you perfect serve retains it for, uh, greater than 17 years. Can they access your data and can they query on it? What do they do to it? And most of all, do they sell it? Because you look at Facebook, they sold all of our data um and built, you know, built an LLM off of it. Uh basically, if something's free, you're the product. So data retention can be tied to legal risk. Um future trends that are coming right now, retrieval augmented generation systems. So this is looking at automated um information retrieval or agents um with automated queries to try to reduce errors and hallucinations in these models. So using the system to improve itself basically uh transparency by design that's where that GitHub model of being able to see the source code of where something comes from is important. And there's also efforts to develop AI certifications. So there's industry certifications and there's pending regulatory laws um that are coming out to help us um at least put some guard rails around artificial intelligence particularly in use with uh what we have. Just remember the ethical framework with my cat picture. If you don't remember much else, remember the cat. Remember beneficence, autonomy, justice, right? Am I doing good? Am I keeping my skill set up? Am I doing the right thing for the patient or the learner? And if you do these three things, you're going to be all right, even if you're a novice. So, kind of my three check before you trust rules. Check your sources, check your references, are they trusted? And then the answer you get is it plausible faculty development if you're if you're in teaching faculty you know maybe should hold a meeting where you just have AI ethics training workshops how are we using these tools what errors have we seen you know let's share what what our best practices are for looking these things up and let's share useful prompts you know this is how I was able to develop teaching materials and such Um, an example exercise would be comparing the output of open evidence versus chat GPT. And you can sit there and have a discussion. How reliable is this? Is there any bias in this output? Is this ethical to use chat GPT versus open evidence? Talk among your peers about it. Mariah, did you need to interject? I saw you pop up. No, no, not at all. I was uh I was seeing you head towards the end and I was going to uh segue into a question in the chat, but keep keep rolling. Okay. Um and uh and I'll stay on for a little bit if people want to stick around. Um but I do want to be respectful of time. Just remember AI compliments but does not replace your human reasoning. All right. Use your evidence grounded tools. Chat GBT is great for images and uh coming up with some slide decks, but use creativity with caution. And remember that there are upandcomers like Grock that have massive compute and is quickly gra gaining ground on other models. So it might be valuable in the future, believe it or not. So anyway, you you you all got the quiz. So I'm going to give you the answer to the quiz. So true or false is open evidence and up to date, do they both generate new medical evidence? The answer is false. It does not come up with, you know, it's new medical evidence. It is sourced from peer-reviewed uh sources. Um, can chapt produce confident but incorrect answers? Absolutely true. Absolutely. Chat GPT can be confident and lie to you. So, I'm going to leave you with this this final thought here before I um get into my bonus content. Do not allow technology to exceed our humanity. We are clinicians. We are teachers. We are humans first. If you keep that as your guiding compass and use these tools with respect and a monochrome of cynicism, I think we will get there. But as Billy Mays would say, but wait, there's more. Um, for those of you who want to stay on, I have a couple tips for you using Copilot to help with teaching. So rapid creation of case-based scenarios and interactive slides. So for example uh if you want to go back and look at my slides, you can you can tailor difficulty level of questions uh in a case-based uh format. So if you're developing problem based learning cases, this is a good way to do that especially if you have a standardized patient. So for those of you who are doing first and second year basic science at VTC, this is a this is a tool for you. You can also take your old slide decks and refactor them. Bring them up to date. So for example, take this slide deck for interns on septic shock and rewrite it for third-year medical students using simpler language and adding the glossery, right? Convert this core module into a high yield onepage infographic, you know, so you can quick review during a uh during a curriculum meeting, right? Be careful that you're not, you know, that you're not stealing images from somewhere. Make sure you source your images and that you have fair use over them. But it's very powerful. And then finally, you can come up with co-pilot for knowledge assessment. So you can generate a 15 multiple choice questions on cardiology pharmarmacology and give four answers. Um but you need to make sure that there's accuracy with manual review. And you need to protect your question banks as usual. So don't just leave this in an unprotected word document that you copy and pasted and that someone can get a hold of. and make sure that your answer choices make sense and they're correct. So, keep calm, do good. We all have this. Just keep beneficence in mind. Now, the meeting's almost over, but there's always someone who asks the questions, so I'm here for you as long as you need me. And of course, I've got all the references you want, but uh I won't bore you with those. Uh looking through the chat real quick. I've tried to use AI for MCQ generation but getting at the right level has been difficult. Uh so Dr. Damat um I would I would say definitely start the prompt with um a role. So I am a physician developing a multiplechoice question on this topic for this level of learner and be very specific. And then when you get the output um if you're using chat GPT or copilot and you don't get quite what you want ask another query say refine these questions but make it more X or make it more defined for Y and you can kind of refine that output over time. Take care Dr. Skolnick. Appreciate you coming. Yes, vaccines and open evidence versus chat PT is definitely eye opening Dr. It is because uh the source the the data the data sources are different absolutely let's see Taylor as AI becomes more integrated in clinical education and practice what you think are the ethical obligations around disclosing when AI is contributed to content whether that's a patient note message lecture or creation of an exam and this question is generated with AI assistance I love you brother Taylor's one of my favorite people to work with um So I personally believe uh we we should have some type of general consent that patients sign that that state that we are using you know large language models to assist us with tasks in their care. Um with that said in my note you know all all of our DAX notes have an attribution generated with DAX co-pilot. Um, I have refined my, you know, my four sentence thing to, hey, I'm using my AI to record every part of our visit and write the note for us. I'm going to show you the note after we're done because it's really cool. You okay with that? You know, so I I make sure everyone knows what I'm doing. I'll even bring up trucks GPT and show them like, hey, I built this model and this is what it's recommending and I agree with this and I don't agree with this and this is why. And that's called true transparency, right? You're involving the patient in the tools that you're using and you're making sure that they're engaged, right? And that's important, engaging the patient in what you're doing because ultimately you're caring for the patient. They have trust in you. So, you want to make sure that you honor that trust. What other questions do you have? I'm an open book, but I am going to take a swig of my uh Mickey Mouse water. We have about two minutes left. Of course, understand if folks need to run. Um, but uh if there are any other questions, feel free to pop them in the chat or speak up. Um, in relation to Dr. Damont's question, you know, recreating the prompt, I know that there's different models out there for prompt creation. Is there one that you particularly like to use? Well, right now I've been blessed to have access to the enterprise version of chat GPT5 and I can act. So, here's a thought for you all. You can actually ask chat GPT5 to help you generate prompts and refine them. So, I guess that's another thing. Chad, if you are in the um if you're in the enterprise group, I don't know if they're going to be expanding that, but um John Suite has access to it. So, if you want to see it in action, um stop by John's office and he can he can show you. Um and I'm not that important. I want it, please. Well, I mean, it's relying on it's relying on having enterprise chat GPT. Um, but yes, um, if as long as soon as that's available, I'll I'll just open it up to everybody. I have no problem with that. Um, as long as you remember that I make mistakes. I'm a human being. Um, but I will continue to refine and work on it and that open evidence is probably a better source, especially if you're going to be looking into inciting um, studies for learners. Mine's really a very quick decision engine for I got a patient in front of me. I have a clinical question I have to solve. I'm this place algorithm. What What do I do next? Well, thank you so much, Dr. Tillo. Again, he I know you're willing to stay on if folks have additional questions, but we are at time for today. So, thank you so much to everyone for joining us for the last hour. Fantastic conversation. Um, and I think we'll continue to engage in conversations like this as uh AI continues to evolve very rapidly around us. So, but thanks for getting us started here. If you have additional questions, feel free to reach out to me via email rrt@curion clinic.org. Um, but I wish you all a great start to your week. I'll see you around. Absolutely. Happy Monday, everyone. All For you Recently uploaded

Ethics and Evidence: AI Tools in Clinical Education and Practice

Speaker:

Objectives

Invitees

CME*

Video Transcript

Tags