Precision Education: Harnessing Data and AI Analytics to Enhance Learning and Assessment

September 19, 2024

Speaker:

Jesse Burk Rafel, MD, MRes
Assistant Professor Division of Hospital Medicine
Director, Precision Medical Education Lab
Institute for Innovations in Medical Education
NYU Grossman School of Medicine

Objectives:

Upon completion of this activity, participants will be able to:

Recognize the confluence of societal and medical education needs driving desire for greater precision is medical education.
Describe a "precision education" conceptual framework that applies for individuals, programs, or health systems.
Identify needed data and analytic capacities for academic health systems to actualize precision education.

Invitees:

All interested Carilion Clinic, VTC, and RUC physicians, faculty, and other health professions educators.

PowerPoint

CME Evaluation

*The Medical Society of Virginia is a member of the Southern States CME Collaborative, an ACCME Recognized Accreditor.
This activity has been planned and implemented in accordance with the accreditation requirements and policies of the Southern States CME Collaborative (SSCC) through the joint providership of Carilion Clinic's CME Program and Carilion Clinic Office of Continuing Professional Development. Carilion Clinic's CME Program is accredited by the SSCC to provide continuing medical education for physicians. Carilion Clinic's CME Program designates this enduring material activity for a maximum of 1 AMA PRA Category 1 Credit^TM.
Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Great, um, so really happy to be here uh again my name is uh Jesse I'm a hospitalist and this is going to be fun hopefully it's this this introduces maybe A New Concept for some and uh old hat for others this idea of precision education and how we think about lifelong learning so I'm going to try to hit across the Continuum um you know throughout the talk today uh I'm going to be acknowledging work from a lot of folks in our lab um markill is our director but we have a huge team of faculty and then there are students and staff that I haven't shown here um so that makes sort of this this whole thing work here at IIM The Institute for Innovations in medad um so some learning objectives really all around this the why what and how of precision education um which I think were're sent around and a few disclosures there largely related to funding so I like to start with the why um you should always start with your why talks and I don't know folks I can't really do the interactive thing but you know if folks know what this time lapse is in the north northern sky at the center of the northern sky there there's this dot this bright Dot and that's the North Star if you do a time lapse in the northern sky the North Star stays fixed um and you know it reminds us to think about what is our North Star and this is a AI Genera image of a patient but our Northstar in education has to be patience and has to be highquality Equitable care of our patients and that's something that cbme I think introduced to education and has sort of taken more route I like to think about our future patients this is my kids and my wife and you know being goofy and you know those are the future patients ourselves and our kids of our trainees and so when we think about sort of our social contract and why we need to do this well this is who we should be thinking of so how are we doing with with this social contract well I I would make a pretty strong argument that there is we are not meeting the social contract related to honoring you know high quality equital care for all these are our admission rates across three hospitals by physician and what I'm trying to demonstrate here is that at the physician level there's really significant unwanted variation in care this is just one study you may say and this adjusted for a lot of confounders but I could show you many of these sort of caterpillar plots of interphysician variability even within institutions in clinical clinically meaningful things this is antibiotics here's opioid prescribing across different facilities and again widely ranging performance out in practice so you might say well let's just teach those old jgs new tricks and you know Ontario thought about this they said they identified these high prescribers they sent them a letter and they said uh what happened when they did that letter and by the way this is the same distribution in the US of of over prescribing and it didn't really work you know folks still prescribed at the same rate it maybe shortened duration on one of the letters that asked them to shorten the duration but the effect size was not very large of course not a strong forcing function but I think you'll appreciate that we might like to intervene Upstream we really want to establish high quality Equitable care in training and there's strong evidence that what you establish in training persists durably this is Conrad Loren um with ducks following him around and in the 50s with hes and others he set up Contraptions to say when do ducks imprint on a mom in his case he took them away from a mom that's kind of the sad part of the story but showed that there was this critical duckling age in which Ducks imprinted on someone else and it wasn't just at the very beginning and it wasn't if you got it too late there was a time where you could not do that imprinting and and it and so other studies in medicine have have demonstrated this imprinting idea that if we look at the program training effects and how that leads to practice later there is something real there that shockingly you train in a place that does X really well the graduates will do X really well or vice versa so at the individual level how do we operationalize this well performance follows a learning curve if you can measure it so performance this is the classic dryfus and drus model that Martin pusic and others have have popularized of sort of moving from a novice to an expert through deliberate practice time and training leading to Performance and you know what does this look like in the real world well Martin and others you know had residents rate ankle radiographs in this case uh from definitely abnormal to definitely normal and measured their accuracy over a time over cases completed and as you see there's this learning curve this growth curve and the group average in Black follows this really smooth nice learning curve I eyes like I showed you but no individual does no individual ual does so how are we to handle that how are we going to approach that so again two curves from two trainees we're trying to get to Competency Based education the current state might be this we say you need to do x amount of time or x amount of cases or x amount of this and that's really at the mercy of that interindividual variability during training fix time variable outcome that's what timebase would look like at those two trainees at that point in time okay well folks say well competency BAS easy enough we just set a Criterion a performance threshold that we want everyone to get to right easy enough time variable fixed outcome Mastery based learning not so easy right the real world is that there is this imprecision and measuring those learning curves and measuring where folks are at along their learning trajectory and that has precluded those sort of sound early judgments of separating those curves and saying training a maybe after only 75 cases could have stopped doing X activity and worked on y activity that they needed to do more of while trainy B actually was plateauing and we needed to just actually entirely change our approach to how this person was learning so I'm going to share the thesis that every training can and should be exceptional and these are six ways I think programs and and us as Educators can do that we have to integrate longitudinal meaningful high density data and I'll show more examples of this we need to focus on behaviors and outcomes rather than knowledge in our assessments we need to develop flexibility in how folks progress and learn you don't have flexibility then you can't do individual level interventions we need to have Predictive Analytics in Ai and I'll share why that matters for coaching and educating we need systems and technology and resources and that's always a big question I get and then individual LEL Precision which I really hit already of sort of like how do we get to that precision and you know that brings us to to this idea of precision education and and we sort of took that landscape markal and myself and sort of said well that is let's call this Precision education and it's a systematic approach to really sort of measuring those growth curves measuring how trainees are progressing defining that curve ideally accelerating the curve there are flat toes maybe it transitions from gum to gme or from going from clerkships to post clerkship phases but also smoothing that path you know we don't want all that undulation in in sort of development and then forgetting and development and forgetting a working definition is that this is a system systematic approach integrating these sort of longitudinal data analytics to do these precise interventions at the individual learner level addressing their needs and that we ultimately have to do something about patient or system or educational outcomes it has to be outcome inform formed that's the cbme piece sanj Desai and others at the AMA have sort of iterated on this idea and broadened it Beyond training to patients schools programs and to say that you know we start with these inputs of data we apply analytics to actually generate insights and visualizations we do processes that are iterative of co-production to generate precise interventions whether educational or assessment or coaching and I'll show you some of each and then we have to learn from our our process our educ ational process to generate outcomes and close the loop this is the Classic pdsa this is built on the top of pdsa continuous quality improvent Master adaptive learner many of you may have heard of these kinds of Concepts it's building on the shoulders of those shines so how do you do this how do we actually do Precision education so I'm going to focus first on that sort of inputs outcomes piece they actually are sort of two ends of the same snake you know and we are trying to develop what we call Precision Education data Hub this is where we take information about traines and programs in our institution and we bring it together in Education data warehouse with data scientists and informaticists as well as the sort of clinical and educational experts it's not enough to just have the data you and it's not enough to just have the informaticist or the data scientists you need both and you need domain experts to help guide this work and so we bring in an evaluation and assessment data we bring in experiential and educational exposures recognizing that really really the vast majority of education is not the explicit curriculum it's the experiential curriculum and it's what's seen out there so if we're not capturing that we're missing the vast majority Transitions and handovers have been talked about so much so we really try to have elements of those captured I'm going to talk quite a bit about in trining clinical data because I think this is sort of the The Cutting Edge of where assessment is going and then there is this idea of well we should of course measure what happens after the real litness for our training programs is what happens when our traines go out into our communities and care for us and our loved ones right and what happens and so we need to close that loop on that data so we've got a number of projects I'm not going to belver them I'm I'm just going to show a sampling of sort of boxes here of we're analyzing comments and looking at empathy we're looking at exposures and performance we're developing nudges for educational resources we're thinking about Admissions and Transitions and then tons of intraining ahr data diagnostic accuracy ordering behaviors and then this sort of linkage into practice so um you know these slides I believe will be provided and you know there's pment IDs for sort of papers within each of these domains should you have interest but let's get into some examples of again still on the inputs and outputs kind of idea we haven't gotten fully into the analytics piece but so one that exposure piece you know how do we harness that well this is work from Carl Drake and and Dan Sartor where they linked icd1 codes to American Board of internal medicine domains and they took the residents at one of our campuses and said all right across 150,000 encounters what are folks seeing in the inpatient setting and the outpatient setting which you see colorized here and this is just interquartile range I'll tell you there are long tailes and outliers on each of these boxes so the first finding was there's clear enrichment of certain experiences like ID Cardiology with some experiences that are very little and some trainees are seeing double the cases of their peers and what was really quite interesting is that when they overlaid our Internal Medicine board content outline of like what the board says you should know sometimes it aligns very nicely like Astron neurology The Fourth Kind of Bin over to the right and other times it's wildly off if you look to the right on medical oncology at least if we believe these data which there are many caveats we're not seeing good align so it gets us start of thinking okay this is the exposure piece you know what should we be doing should we be assigning admissions differently should we be assigning osis differently should we be assigning assessments differently right it's can start you can start thinking about what would precision like actually maybe there is quite a lot of degrees of freedom even if you don't change the duration of training you don't change the uh structure of your training program there's actually a lot of degrees of freedom to play with so on the assessment side then so that's the exposure piece how do we get precise in assessment and this is adapted from uh lamber shorth and and case Vander gluton I think this has made the rounds of this idea that as you get increasing assessment breadth and sources density longitudinality context relevance you know the pixels of the picture coming in about each of our individual trainees hopefully gets clear and I think you can now see where this is driving to to this amazing Mona Lisa and then this is rendered in 3D by the Lou there's some challenges with this analogy you know it assumes where you know where the pixels go and how they sit in relationship but as this person on Twitter said you know you know competence is a jigsa box without the Box came came in jigsaw puzzle um or that you know we prematurely close like maybe we're getting the background road but we're not actually getting the main event of like the Mona Lisa so I think there's a lot of challenges but this is the pr premise of precision in assessment and program of assessment so I wanted to share some emerging approaches in this space that I think will be coming to a program near you soon you know the base the sort of what has always been done is exper observation these are work based assessment osis and there's a wonderful paper that I helped uh on the editorial side with from Ben kener and colleagues out of univers Cincinnati about the an analogy of our assessment programs to Moneyball to uh baseball and the evolution of baseball performance assessment and you can think of these expert observations and work based assessments as sort of your scouting or game film and yeah it's gold standard you're getting multi-source feedback but riddled with bias and subjectivity ity and it adds workloads to assessors and we I'm sure you struggle in the same way with our institution where we just can't get people to fill out high quality comments on their training so that led into the era of learning analytics and these sort of EHR based measures things like RSQ and tracers which I'm going to Define in a second and EHR metadata which might be like the sort of Wis above replacement or on base percentage in baseball um and without dwelling on like what those are it's going Beyond human expert judgment to some measurements that we think predict the outcome we care about much more objective quantifiable scalable however lots of challenges with data integration and attribution and tangibles and I think the future will be more Tech enhanced skill measurement wearables motion location capture ambient AI I should add to this and that could be analogous to sort of like what the current state is in baseball where they measure every p for how it spins and how it moves in the launch angle of hits they've atomized performance down to its irreducible elements very high density natively attributable and that you know that if I'm a pitcher and I throw a pitch and it spins like this like that's my there was no there were minimal other effects maybe wind um and other ballpark effects which basball actually handles but very costly very difficult to operationalize so let me hit hit a few of these of this sort of newer generation of assessment getting to that Precision idea how do we get that Precision density of of an Insight so Dan Schumacher and colleagues developed this concept called resin sensitive quality measures these are U measures that are meaningful for patient care and to the trainees but are reasonably attributable to a single individual in their initial paper they did three conditions in peeds asthma bronchitis head inury this is just the asthma exacerbation and they bundled a bunch of quality measures 21 asthma quality measures you see some of the types of quality measures on the right and you can see that when they make a compet composite score at the encounter level there's quite a range of performance there are some encounters where all the high quality things are happening and then there are many encounters where you know only 60% or 50% of of these quality measures are happening and these are in the PDD world like standard things for an asthma exacerbation now they use human chart reviews so this has been very difficult to scale more broadly and so we worked with them and others to develop a concept called tracers which adds on to that attribution idea this idea of Automation scalability and real time nature elements that we thought would matter a lot for scaling and for timely feedback you know it's not so useful even if it's scalable but you get it at the end of the year your dashboard results what you really want is feedback right after you saw that case like that's when it will change behaviors to get to that old drugs New Tricks so let me show some examples of how we're harnessing clinical care data and to to measure that performance this is clinical reasoning uh in Internal Medicine residence I'm showing two real notes from traines that have been de identified and you'll see one of lower quality and higher quality and these are uh for epas for residency there are also acgme competencies and yet trainees say in many studies that they just don't get a lot of feedback about the quality of their documentation the quality of their clinical reasoning okay so we human rated these notes using a scale that we developed here this is work from Verity Shay and colleagues and developed an AI model on top of that to in the first instance though we've gone beyond this say was it lowquality or high quality clinical reasoning document mentation what happened well one we could do it and two with high high uh performance measures but two we detected profound variation at the resident level this is a plot of performance as a percent of their notes admission notes in this case where they wrote the note and the resident level performance and you see one there's a wide range of performance and two we said major program level effects we have two programs one at a community hospital one in in Manhattan and they had very different admission processes in residents also different systems factors and very different performance you can see we're able to analyze over 28,000 patients 37,000 notes and what this allows us to unpack too is how does that performance contextualized with the clinical learning environment this is performance on the Y AIS now based on hour within shift color coded by night versus day our night teams are actually relatively protected compared to our day teams they're just there to do admissions um and the quality of the night team's admissions as from a clinical reasoning documentation standpoint was higher as you see but there was this falling off that was happening as hours went on we were like gosh is that people are getting tired what's interesting is that if you look at the actual index of the note within a shift the same thing happens in other words if it's your first note the quality is much higher than if it's your fifth note of the night and we can use multi-level modeling to figure out which it is but the point being we're not only contextualizing this variation performance on a task but also understanding how context matters and you might say like as a program leader well maybe five is the number of admissions where we need to stop even though acgme will allow you know eight or something or 10 I don't know what the number is but it's a large number for our complex patients um if I got eight admissions overnight with our patients transplant and all this crazy stuff it would be a very rough night um and not good quality Care all right another example so this is again an internal medicine residence you heard I was a hospitalist so I'm I I do a lot of IM projects um looking at insulin for type two diabetes okay so ordering event is admission order is like this triggering event and at that time we attribute an encounters orders to a given traine in this case the traine who ordered the plurality of medication orders um in this case we're not using AI we're using uh queries of our electronic health record and you can see again hundreds of residents thousands of patients and encounters thousands of orders this would not have been possible at all with chart review now this was very hard to develop it was not simple to develop these queries but our hope is that we develop them and we share them and we can scale this so how do this look one of the measures we wanted to look at was ordering of insulin long versus short acting insulin okay it's generally recommended that for patients with type two diabetes in in the hospital um that you may need to order some insulin at a low dose or with long acting if they higher risk and so we just wanted to look at those behaviors so I wanted to show you how you can phenotype residence this isn't the subset of residents who at least five Encounters in the time period we analyzed you get the vast majority of residents 238 who sort of favor short act 2:1 ratio the dotted line is the least squares fit the y- axis is long acting insulin the x-axis is short acting insulin ranging from what percentage or proportion in this case of type two diabetes admissions where you were putting all the orders in did you choose to order insulin so we see a trend but we clearly see very different range of insulin order rates like is 75% right 25% right I don't know I'd have to look at each patient right and think about that with endocrinology and think about the risk factors but immediately were seeing performance variation within the bulk of of trainees but what's interesting is that there were other phenotypes so there was also this phenotype of Resident that every time a patient with type two diabetes comes in they're ordering long acting and short acting they just do that no matter what there are 40 residents just like basically along this one to one line now at different rates of ordering like if they choose to order not varied but when they ordered they ordered both then you had the neverl long acting crew we had another 36 people never ordered long acting insulin which is pretty hard to do in our patient population we have some very high a1c's people on home insulin these are not you know a metformin uh only group you have a subset smaller who favored law acting insulin who actually ordered law acting insulin sort of above and beyond their peers what's going on there what did they learn in medical school coming into residency that's sort of persisted we could unpack that with qualitative we had a couple that never ordered any insulin at all which I thought was quite interesting as well we had one person out there who just always order short acting like 100% of the encounters short acting insulin and then we had folks who never ordered short acting they only ordered long acting in the cases that they had so just to say different phenotypes that popped up and it it beged the question what's going on you might be asking well what about all these risk factors what about confounding well the beauty is when we put this all together and we include this EHR data we can control for confounding so when we build models in this case uh generalized linear mixed models to account for confounding and control for the random effects at level patients we can assess each individual resident's uh propensity for ordering these types of medications and this is actually plotted again against each other but in the odds ratio space where one would be a you know you your odds is the same as all your peers two is double the odds of prescribing that thing and there are these four quadrants and what we see is that there's a scatter around there there's range again there is variation and there is statistically significant variation in that performance you can see folks who are higher short acting higher long acting both and lower of both so in addition to these crude phenotypes we're also seeing risk adjusted differences in Behavior and I hope you take away from this this is a model for how could we apply the same approach to opioid ordering pain medicine ordering antibiotic ordering choices Etc in the ordering space maybe it's x-ray ordering Etc now EHR metadata is a very talked about idea um and so we've been looking at how can we mine the EHR metadata to get more Precision about performance of our trainees we send millions of messages a year in our secure chat we use epic um with secure chats um and this is an example of a single IM resident actual messaging Network where in one case they were like the central person and in another case they were sort of peripheral person as you can see in the star and that we can measure the distribution of how Central they are in their network of measuring of of messaging as well as across other interns now what does this all mean is what is good what is bad there's a lot of work to be down in the validation space to understand the meaning but I'm starting with where's the variation where's the performance differences and we like to Benchmark and make sort of dashboards for folks to look at now that third era so we talked about sort of the first era workplace Bas assessment second era was uh these EHR measures and these R RS qms and tracers a third era is like this Tech enhanced skill measurement wearables and motion capture this is really cutting edge stuff so this is Brian gal's work work out of Hopkins where they had motion tracking devices on every resident and I'm just showing you here the bottom five and top five as far as how much bedside time these interns were spending and what you can see is that there's ranges across interns and within interns So within each column here like 43 those are different days of the year and you can see some days they spent 80% of their day in the patient room and other days it was 0% but even across interns the the median line the bar in the middle of the box plot varies substantially what was really interesting is that kind of tech-enhanced skill measurement tech-enhanced Behavior allowed them to then say well what's going on on different Wards what's going on during rounds oh it's interesting in jedmed folks are going into the patient room a bit more uh during rounds whereas on the ICU rotation they actually move to the hall during rounds and they're not actually going in the patient rooms or just saying in the hall mostly so things about context things about the environment start to come to the four with these kinds of devices this is Carla Pew's work and others on motion tracking where they make a simulator this is actually this is not a real patient but they have motion tracking sensors and it's a technical oy they have you repair a hernia defect with these motion capture and they look at the quality of the repair of the hernia and then they say who did really well based on the out out come versus who didn't do very well you see this distribution of performance again I want to get at that idea like no matter what you measure there is this distribution of performance and is it so shocking that out in practice then we have distribution of performance across practicing doctors who would like to get to a place that everyone gets that highquality Equitable care we are not there yet and I worry our education is letting down our patients and so they were able to then segment those two and say okay let's look at the best performer versus the less good performer and look at behaviors and they were able to isolate okay these are when their hands are at the you know instrument versus near the stand versus the site what's this stuff over here that I circled and read well this is unnecessary movements these were these other movements and so there was this signal based on motion capture that said ah this is what is correlated with higher or lower outcomes on the uh on the hernia repair and you can think about what you might do with that a little intensive though to actually take you know have the sensor devices have your own OSI that can do that so this was an incredible paper recently out in Gemma surgery where they just took uh YouTube videos first and trained a model and a large language model in this case it was a vision based neural network on those YouTube case videos literally free videos on YouTube first they trained it on that just to recog recognize hands and you know instruments and different devices and movement so that was step one step two they applied that data to their videos in their o and they said here's a trainer versus an experienced surgeon what you'll immediately see without getting in to all the details is red is fast blue is slow the amplitude of movement of a traine versus surgeon much different the speed also much different slower lower amplitude efficient movements of an experienced surgeon versus a and you can see in the plot at the bottom right of what's going on there as well but you'll note even an experienced surgeon in that upper right hand corner there's this orange triangle that is not doing so great and so again there are also some incredibly gifted surgical trainees relative to their stage of training so how could we use all this in our training programs routine data Final piece of the inputs and outcome story is how do we measure the final product the sort of distal stuff so we've been looking at Milestones across programs and opioid prescriptions and saying can we look at Cross programs and what we can find is that we can segment programs based on programs that produce very high prescribers of opioids 13 million more pills um likewise we can look at prescribing patterns at the level of medical schools and residencies so in this case we're looking at a big code of Medicare prescriptions billions of Medicare prescriptions and we said here's NYU data whether or not you went to NYU for medical school residency are both what's your opioid prescribing rate or your brand name prescribing rate what do we see we see a a hit each time you you have that exposure to NYU training this is kind of capturing all of what NYU training was and we see lower opioid prescribing rates but higher brand name prescribing rates at NYU um and so what might we do with that how might we look at what was going on in our curriculum what's the exposure what's happening is it where they're going in practice is it what they're seeing in training you know a lot needs to be unpacked there but helps us get at this idea this work supported by the AMA all right analytics so how do you do all this I hope you've probably seen that there's quite a bit of analytic pieces and I'm not going to dwell on these because I want an interest of time get to a little bit more meat but there are maturity models of analytics of going through descriptive versus Diagnostic in the past to predictive and prescriptive analytics more complexity as you go left to right on this there's also this model of sort of the diw model out there in data science of this idea of moving from data to information to knowledge to actual wisdom and that much of what analytics and AI is seeking to do is to take you from the raw element the data to the actual wisdom data itself is completely useless so that is the Crux of what analytics really is I believe um last Model that exists out there is the the four vs of data science such is volume velocity variety and veracity these are all challenges we face in doing analytics you know high volume data it's moving quickly there's all kinds of types and you know and we don't know some of the data is good some of the data is bad how do we handle that so just a few Frameworks I think before we jump in this is an example of how we used analytics to phenotype Learners so this is application data to our Medical School across multiple cohorts every row is a variable every column is a matriculating student nearly a thousand and what we were able to do with Cam's clustering a team uh uh published down there that you see is identify these four phenotypes based on performance and it was actually sort of like a solid signature an improving signature sort of static signature a rising signature and there's reasons for what they called those names and in the in the text and they were able to apply those phenotypes and say okay well how do they do in our medical school later is that related with something meaningful and sure enough they found out that these phenotypes mattered and that there were certain performance that happened based on what would otherwise look like all great GPA scores they're all good all good all good performance but um you know slightly different Trends in performance and phenotypes so identify latent signatures of success was able to sort of say okay maybe this is who we want to recruit of course some limited limitations here with this is medical knowledge largely and it's sort of a narrow definition of success I'm going called that out but an idea of how analytics can take data to insights to maybe some knowledge or wisdom I think high performance Computing is pretty important to all this we use a lot of high performance Computing in GPU time um this isn't available to everyone but um analysts you know this is actually the sort of the real world today Nvidia is worth more than many companies all these companies you see at right combin because the the the world revolves around gpus now in Silicon I would be remiss if we didn't talk a little bit about AI today so just to set the stage you know AI are these systems that perform with functions Sim associated with humans machine learning a subset of that where you're detecting patterns in data um using different types of algorithms deep learning is a sub set of machine learning where you're using these neural networks that mimic neurons across multiple layers so you can send you know a a a picture of this two and it figures out that that's a two um going through these layers and then generative AI came out of that world to say let's develop language models that now use all sorts of data shouldn't really have couldn't call large language models but you know text images video music and we reinforce that to make it chatable and respond so geni crazy crazy adoption I mean compared to Netflix compared to Twitter compared to Instagram just the adoption rate of chat GPT in over the first days was so meteoric the adoption rate of the people we care about undergraduates Learners pretty high this is a survey fairly recently we don't have a survey on uh trainees that are medical students and residents that I'm aware of but um you know 50% admitting to use for on at least once a week these models predict sort of the next word or the next pixel or the next sound and so you can now do things like take a picture of something this was on Twitter so I'm not sharing anyone's anything um but took a picture of it and said tell me about the side effects of this thing oh yeah it's h it's keux this is indeed an antibiotic you know it does this it treats that here side effects you know so it's just really quite incredible where we came from so how using gen as part of our analytic strategy at NYU well some obvious ways are mapping curricula taking a lecture description and mapping that with a prompt to different keywords um in the USMLE or mesms we're looking at lecture transcripts and saying what are the key points so this is a student's dashboard every day and they see their my day they see a summary of all the lecture transcripts and what they heard about today sort of hope hopefully engaging with the information you know we need to do assessment to see is this working is this doing something above and beyond but it's fun we're taking goals so this is our coaching application a real goal a student entered um I think this was a resident actually you know I want to expand my knowledge of physiatry um which is pm&r you know and these are some ideas I have so we they can click the button for you know this GPT and it'll try to give NYU Grossman School of Medicine specific excuse me advice um and it's incredible how much these language models have scraped off the web about NYU and other places like if I I'm an internist I don't know anything about pm&r advising this resident so it helped take that goal into and and really build robustness and you can regenerate as many times as you want this is in the more a more higher Stakes uh scale so those were sort of like fun and cool this is actually summarizing comments so in this case it's very formative this is not changing promotion or tenure but this is faculty and combining medical student resident common two different systems always was a challenge you all may have the same thing so for admin you know how do they see a summary a quick snapshot of their faculty so taking all that comment combine them for residents same thing let's look at faculty peer student osy comments bring them in one place and and call out strengths call out weaknesses really bring to the four what what it thinks is important obviously validation work needs to be done before using for high State decision but from a format lens we already see value here compared to the current state we can't compare to you know can't compare to the Almighty we have to compare to the current state and you know in that way this is really a real advancement all right so it's about 41 after I have my clock um couple more slides on intervention so this is we've done the sort of inputs and outcomes we've talked about how you analyze your data to generate those insides so now I want to get to some more of like intervening how would you actually precisely intervene and uh we wrote this paper with Carl Drake and others about you know Theory how we bring Theory into invention how you might take coaching and you apply malal Theory or dashboarding so let me show you some dashboards as digital tools is one of the common ways we intervene top one is a diagnostic exposure dashboard for our residents bottom is that clinical reasoning I was mentioning this is a newer version of sort of giving them performance scores and what you'll appreciate is we try to set Criterion benchmarks in the bottom one to say here's your target here's where we want you to be not here's where you are relative to peers which is the which is the top plot we link it directly to the encounter so they can go back and look at the actual individual encounter and coaches and PDS can look at outliers um and organized by board domains here's an example from holistic review I'm actually going to skip over that in the interest of time but it's been published in a couple papers down at the bottom where we developed the decision support tool using all that data to help intervene on the the decision for interview in our residency programs another idea would be like targeting educational interventions that's what we want to intervene upon while the theory might be self-determination Theory and the tool might be a nudge strategy okay these are nudges these are real um where at in the morning we have queried the information in electronic health record to say okay you saw X case yesterday okay here's some information here's some educational resources you might find helpful and they're timely and relevant and they can immediately comes to their email they can go listen to the uh podcast they can read a upto-date article um and for the students they receive a similar thing it's a little cleaner here for the students where they have questions available they actually get PubMed articles and summaries of both a review article and an RCT um based on the exact case they just saw now what does it all do do they open it these are the implementation science pieces that are hard and that we're trying to work on um but as a tool we're getting a lot closer so I'm going to close with the future um you know I think you all programs and educators are the tip of the spear for that unwanted variability for meet meeting that social contract um Precision education I think is an organizing framework you know for this outcome oriented education um you got to collect inputs and outcomes from multip sources um and data is the new currency I've met at you have to invest in data dashboards allow you to align sort of educational clinical informatics I didn't talk about that so much but those worlds linking is actually critical um making those a little closer and then you got to develop as an institution and and with your team's analytic maturity to start to do these these things Theory still matters tools matter um and then Learners I didn't talk a ton about it but they have to be engaged in co-production they're usually the smartest people and will kind of put you to shame from an AI lens you know your students in redin are using this already so get on board is kind of the lens that we've been taking is just like hey it's coming to us whether we would like to slow it down or not that's what's happening and so we created a policy we have a secure instance so they're not putting stuff up on chaty PT the web version we're trying to be trans trans arent about our own use of AI and data and you know we're being growthmed like this is amazing possibilities but there are challenges and risks here so let's try to be transparent there's tons of unanswered questions in meded and AI some that I've listed here um that's a link at the bottom right of some tutorials and examples if you're interested in more of the AI space um that we make available um and so you know maybe this is what's going on right now we're all looking at sort of this pieces of the puzzle and it's it's hard to see the big picture and I think that is true um but we're hoping to get there um there was a great supplement out in late March um you know all on PR education so I encourage you to check out some of some of those articles and uh here's funding sources collaborators uh as well so I'll stop there I'll try to stop screen sharing and we'll take some questions thank you so much drel I have to say that I'm so glad that I'll be attending the five o'clock session as well because there was a lot that I was like oh got to catch that I got to I got to catch that on the next round and there's so much you know you shared that screen with the the boxes um from data with the final boxing wisdom and there's so much in between there and I think there's so many concerns about are we getting the right data are we using the right information to inform that wisdom or what um we do have some questions that were some wonderful questions that were um posted in the chat I'll try and get through them uh so one is let's see to what degree can order phenotypes be explained by shared order sets yeah so all those traines you saw work in the same exact tell system they have access to all the same order sets and so you know we have some ability to sort of control for admission order sets and understand who's using what but we control for patient factors comorbidity A1C scores prior to admission all that and there's still variation so some but not all if you actually calculate the percent of variance attributable to the individual trainee it is only 10 to 20% like there's a lot that's related to the patient in the system so yes but that 10 to 20% still clearly matters a lot right so um yeah that's what we see thank you I'll move on to the next person and then Circle back that the first person had three questions uh the next is from a practical Viewpoint of a program director gme program director how do you match clinical needs of a given resident of a given residents to practical needs of staff services and Equity to resident displaced for another Resident this is the implementation science that I like nicely glossed over right this is the last Mile Challenge of like oh yeah easy for me to say just assign them a different admission very hard in real practice where our residents are employees and are keeping our Hospital probably like yours afloat so I was a resident not too long ago and so we do struggle with that I think it's easier when we're proposing something that's a net even balance so two or three admissions come in within the same hour do we have the ability to hold them for 30 minutes until we can say who it should be assigned to or not that's coming up against throughput actually more Enterprise related issues because they want to get them out of the Ed because there's Ed metrics and things like that so it's been less about the like personaliz ation to the residents people are on board with that it's more been pushing against Enterprise Metrics and things that just say throughput throughput throughput thank you um have you been able to connect data on individual resident performance with actual patient outcomes for a given condition well that's the Crux yeah that's the whole thing right and so that's where we hope with those tracers that we're getting closer um you know outcomes is a big word so there's mortality there are distal outcomes and then there are process measures that are the most proximal of I did this thing and it's not really an outcome and there's everything in between so it spans a spectrum from very processi related stuff to very important distal stuff where the resident has a very small contribution to that outcome that we could maybe measure in large scale studies and we are trying to but how would we actually say okay you resident you contributed to the readmission but in this amount um so that's the tension there of patient outcomes and why I think the field has struggled there's been some good critiques coming out of Canada of this idea even that we could link Milestones or other things to outcomes that maybe that's a Fool's errand based on my work here at NYU I do not think it's a Fool's errand I think it's about choosing very thoughtfully what outcomes you're measuring generally focusing on more process-based measure and then work from University of Cincinnati and Eric wm's group group has really said Engage The Residents in a co-production process and a co-decision process so at at UC they actually do a defense of the measures so at the beginning of every year and they have a very unique residency structure but where they actually do a long block so they stay on ambulatory for like 12 months um and uh you know at the beginning of that year all the residents come together with the program directors and they say okay we can measure A1C we can measure hypertension we can measure all these things ranging from like very processyoutube and they vote actually on what they're going to measure what they're going to held accountable they get to have say into it and they get ownership and then they get shared credit also for if they meet it and that kind of helps start the conversation of like yeah there's imperfection absolutely with these outcomes you are not 100% of mortality hopefully not you know haven't made a huge error where that patient dies because that often it's many patient factors that happen before you um but let's all take responsibility for our patients and try to strive for the best and work towards that in a more formative lens so that's where they folks have sort of Taken these and said okay warts and all let's aim in a good positive direction even with that tenuous line of attribution so for example on two of the things I showed you insulin ordering and clinical reasoning it'd be so tempting to say well we'll link that clinical reasoning to diagnostic error and we're trying but it's complicated very complicated from a research lens um because there's R reverse causality so the more complex patients who tend to have worse outcomes also get the more complex admission notes because they came in more complex and I'll tell you charlon scores and alexer scores we have them all they don't control for all that they don't capture all that unmeasured confounding so you'll get paradoxical things where it's like worse outcomes or Worse X because of better performance here and it doesn't make sense and you're like oh well it's a proxy measure for the sickness of the patient and same thing with insulin ordering um so we've been focused more on hey can we set some criterian for what we think is important we're getting closer to Patient outcomes and patient related measures let's agree to that this matters for its own right like you should not be never ordering insulin that's probably the wrong thing in an internal medicine service like let's talk about that let's talk about your behaviors and like come to more Harmony more sort of high reliability hro principles so um that's how we're approaching that challenging link long answer but I think it's the Crux of a lot of this absolutely it's very helpful um and looks at outputs uh versus outcomes I guess a lot of times um how will we be able to assess resident performance as they begin begin to use generative text to create assessments and plans yeah so I would argue that the future will be not writing any any notes it will be all scribes in the inpatient and outpatient setting and you can react to that two ways one is okay is there still meaningfulness to the the the note part or two maybe that was never the right thing and we were using it because it's what we had available and as a proxy for something more meaningful like what their cognitive process was their reasoning Etc that's exactly what we're using it for right now we don't argue that h&p notes are like that really special documentation that special it's just what we have because we couldn't get them to after every case dictate their thought process you know to us so um I think there will be a future state where either we have access to that ambient data of what was actually happening in the room the raw data and understand their thought process uned by AI or we will have to refocus our assessment efforts on what really matters and maybe it'll allow us to say okay that is no longer a good way of assessing that behavior and we need to go after it in a more fit for-purpose way whether in oy or some other fashion so I think it's going to force us to take ask hard questions about the question you brought up you know you know was like what's the right data what do we focus on what do we try to collect because you can't do it all no right right um this one's more of a comment since we operate in teams and the resident who enters the order is not always the decision maker also uh on call teams in days off all money the water absolutely um I will tell you though um you know I work close to the Steph SE boox High out of Stanford who's working on teaming and interdependence and you know I think that is like you hit the nail on the head we gotta we gotta get to the outcome of care is the outcome of actually multi-te multi-team uh units so there are there's all this team science stuff around how there's actually teams of teams like and the outcome the O overall outcome of care like the mortality re admission is how all those team units fit together and provide care to our patients and so we're trying to piece out this little individual within this very complex team you're absolutely right that there's a lot there and that that makes these sometimes tenuous that's part of why tracers focus on measures where we think the attribution line is stronger so we focused on for example orders when you were the one who ordered the vast majority and it was about 80% or more of the medication orders not just any orders medication orders within 12 hours of admission so you put in almost all the medications and for that patient when they got immediately after they got admitted out of an emergency room and so we said we think you're on the hook even though we don't know was there a senior resident sitting in the team room who is saying oh no don't put in that insulin we put in that this is what we do you know so but now we're set up to say well I also can figure out who the senior Resident was I could figure out who the attending was I could say okay let's layer on those and build nested models and say can we tease out those effects how much of it is due to the attending versus the team versus that so it's to say yes a major challenge yes it's something we need to get to but I think collecting and measuring this stuff will actually get us to a place where we can start to unpack team based care and my last piece on teams versus individuals is that we license credential graduate Advance individual and I don't see that going anywhere anytime soon so we need to do both um and you have to also assess individual performance in the same way baseball used to measure just team based stuff wins now they've atomized performance down to the level of the pitcher spin rate because they realized that just doing wins was like well who are you playing against and what ballparks and with which Empires and which Supporting Cast right and they realize that that wasn't the picture so they've gone through that Evolution to really understand how the ultimate performance the wins relates to the at Atomic elements of performance if you will I think we're in our infancy and medad of doing this internationally yeah a big part of what I hear you saying throughout this is the measurement tools are so critical and we have to make sure that we're have the right tools but also we recognize what they're measuring um yeah yeah and and and unfortunately we can get fooled like I haven't presented a lot of validity evidence like there are some of these things maybe I think that's a good signal but actually I'm we're fooled it's a proxy for something else right and so I didn't mention a lot of qual because it's not my area of expertise but qualitative work is incredibly important to actually go talk to people and do all call stuff and understand the process it's like the go see ask why show respect of Qi you got to go see you got to ask why you got to show respect that is so critical for under the understanding phase so I I typically do a Quant then qual some people do qual then Quant but I think the mixed approach is really critical yeah yes one more question that was in the uh chat uh will we be able to predict which residents will um will and will not should not graduate from their residency oh that's a valid one I'm not sure I'm going to answer that I think it gets at Value questions related to who should be in our pipeline what is the duty of residency programs to train folks I will say my my guiding principle is that once you have admitted someone your task is not to just keep them awesome your task is to make them better no matter where they start and you take on that responsibility whether you're a medical school or residency training program or Fellowship so that's part one is that I think the goal is not to admit the best person and keep them the best it's to grow every person you admit and of course everyone wants to get people who will be who will grow in their program so that's the Crux of admissions questions is who can how can I figure out who to admit who will grow in my program who will not stagnate or flounder or cause issues or challenges whether these kinds of data will allow us to then back predict or inform the admissions process that's a very high stakes decision and we have avoided trying to bring it back into admissions for fear that again we might be systematically excluded certain people so um and and again I don't think there's a net Market of people who can go to medical school residency it's the same there's no extra people it's the same Market we're all fighting over the same people right so really we just want to have that optimal alignment between that market and where they end up in in schools and residency programs and and making sure we know what we're getting so it's more for us about not who we should get in and how we design rather how we coach design individual learning plans Etc fabulous thank you so much this been so informative of a session to all of you if you have colleagues if you want to come back to the 5:00 session please feel free to do so it will be a repeat of this s but it'll be a little different we'll spice it up every time um or if you want to encourage your colleagues to um attend please do it uh it should be the link should be on all of your calendars um let us know or if you have more questions you can stay on for a few minutes uh or come back later on this afternoon so having said that enjoy the rest of your day thank you so much for being here.

Precision Education: Harnessing Data and AI Analytics to Enhance Learning and Assessment

Speaker:

Objectives:

Invitees:

CME*

Video Transcript

Tags