Making sense of science: Using LLMs to help reporters understand complex research

11.07.2024 18:18

During the news gathering process, reporters can often struggle with understanding complex, jargon-heavy documents, particularly in fields like science and technology. For instance, a tech reporter trying to write a story about a new study about AI may encounter difficulty making sense of specialized terms and concepts from the field. This can create friction during the process of reading the documents to determine what may be a newsworthy angle to cover about the study.

To address this challenge, we propose using large language models (LLMs) to identify and define jargon terms within scientific abstracts. These models can be leveraged to wrangle and transform textual input into different formats (e.g., news headlines or summaries based on news article text), and may lend themselves well to the use-case of simplifying complex terms, especially if a definition is available but needs re-writing into more easily understood words. Such systems could also support the contextualization of news articles for readers, as demonstrated by BBC News Labs a few years ago.

This blog post walks through how we built and conducted a preliminary evaluation of a prototype to test out this idea, and describes what we learned in the process.¹ We hope this can help others engaged in similar projects, and extend to systems that support journalists’ sense-making of documents in other domains, such as complex legal or medical documents.

Using retrieval-augmented generation

Retrieval-augmented generation (RAG) is an approach that can enable this kind of simplification of complex terms — it allows a user to input a “query” (i.e., a prompt, a question, or even simply a jargon term), and then matches it against text in a reference document that could be used to help derive a simplified definition. For instance, a jargon term like “precision metric” from a scientific abstract about a novel AI model will likely be found in different sentences within the text of the whole scientific article (Note: In this post we use article to refer to a scientific article rather than a news article.) RAG relies on finding these matching sentences or text snippets, and supplying them to an LLM, along with a prompt instructing how the snippets need to be used — e.g., to create a summary, or to generate a readable definition.

Two assumptions we made when designing this prototype with a RAG approach were that: (1) the retrieved snippets will actually be informative for creating a definition of the jargon term, and (2) even if the RAG output has minor errors, a human will actually verify the supplied definition if it elicits their interest.

By employing such a RAG approach with GPT-4, we designed a prototype system to provide reporters with clear, concise, and accurate definitions of complex terms. We also designed the prototype to personalize the identification of jargon terms based on a reader’s knowledge level, making it easier for journalists with differing levels of scientific knowledge to parse these articles (e.g. a general interest reporter might need something different than a seasoned science reporter working her specific beat). This prototype was constructed and evaluated in the lab using a sample of 64 peer-reviewed articles published on arXiv Computer Science in March 2024.

Preview of the prototype

The prototype is built as a web app with a list of scientific articles, displaying each article’s metadata and abstract. Jargon terms are highlighted within the abstracts, allowing users to hover over them for instant definitions. Additionally, users can click to access a comprehensive list of all jargon terms from the abstract, along with their definitions. A search bar allows users to find articles of interest, and filters allow users to sift through specific categories of the articles as well (we focus on topics in AI, Human Computer Interaction, Computing and Society for now).

Building this prototype entailed work on two main problems:

Identifying the jargon, based on a readers’ expertise
Defining the jargon, based on the text of the scientific article

Identifying the jargon

We tackled the challenge of identifying jargon terms in scientific articles using GPT-4, with a prompt template that allowed users to specify their level of scientific expertise. We used this prompt to generate a tailored list of jargon terms for that user.

This was evaluated with two annotators who manually identified jargon terms in the articles in our sample dataset, and who also provided their own levels of expertise in the prompt templates by describing their knowledge in natural language. The overlap between the resulting sets of jargon terms — for each individual article, for each individual annotator — was used to understand how well GPT-4 performs at this task. Worth noting here is that both annotators had differing levels of scientific expertise, and this was visible with how one annotator consistently identified more jargon terms per abstract than the other.

We find that GPT-4 shows promise in identifying jargon terms, but does tend to identify more terms than the human annotators. It successfully captured most human-identified terms but also incorrectly labeled many words as jargon which the annotators didn’t think were jargon. In information retrieval jargon this equates to a high recall but low precision.

Interestingly, GPT-4 does maintain the relative differences in expertise between the two annotators, identifying more terms as jargon for the less expert annotator. This suggests potential for personalization based on readers’ knowledge levels, and aligns to similar recent findings as well.

Defining the jargon

Once the jargon terms were identified, the next challenge was to provide clear, concise, and accurate definitions. To support our RAG approach we sourced relevant snippets of text for a given jargon term from the complete text of the source article. These are obtained by calculating the similarity of the jargon term to individual snippets of text from the article, and returning snippets that exceed a certain similarity threshold. We use cosine similarity to capture semantic relatedness and chose a threshold of 0.3 (low to medium) to capture a wide range of relevant snippets while excluding irrelevant ones.

We then used GPT-4² with a query prompt to generate simple and understandable definitions from the retrieved snippets. We also generated another set of definitions for comparison, solely based on providing the abstract to GPT-4, and asking it to infer a definition of a given jargon term based on the text of the abstract. Annotators rated both definitions for a given term on their accuracy, and then recorded a preference for one or the other (or a tie), based on their clarity and informativeness. Pairwise preferences such as these are often used in LLM evaluations.

The accuracy is calculated as a percentage over all definitions generated by a model. The preferences are measured by calculating a win percentage, based on the number of times one approach (Abstract vs. RAG) is preferred over another.

Surprisingly, GPT-4 with the abstracts performs a little bit better than GPT-4 with RAG over the article text, both in terms of accuracy (96.6% vs. 93.5%) and win percentage (29.2% vs. 27.8% — the rest were ties). This suggests that more context from the scientific article did not necessarily lead to higher accuracy or better understandability (i.e. our first assumption about using RAG here as described above was not met). A deeper investigation of the retrieved snippets and their actual relevance to the jargon term may help understand if this is an issue with the quality of the context, or if there may be other causes such as the similarity threshold we used. To apply RAG successfully, it’s essential to explore and test different parameters. Without this kind of careful experimentation, RAG on its own might not provide the desired results.

It may also be the case that the large size of GPT-4’s pre-training dataset enables it to draw from other sources to generate definitions. This can be as much a concern as a benefit though — it can make it harder to override irrelevant information from the pre-training data, or eschew its limitations based on cutoff dates for model training.

We also found that the effectiveness of these approaches varied based on the reader’s expertise. For instance, the less experienced annotator found similar value in both methods (higher tie percentage), while the more expert reader noticed more differences. Further evaluation with a larger set of annotators may help to replicate and understand these differences.

Closing notes

This blogpost demonstrates a small and scoped experiment in using GPT-4 with RAG to generate definitions of jargon terms, in service of improving the experience of reading complex documents during the news gathering process. We find that GPT-4 performs fairly well at identifying jargon based on a reader’s expertise, although it does tend to over-predict a bit.

Contrary to expectations, we also find that GPT-4 with RAG over an article’s text performs a little worse in terms of accuracy and clarity/informativeness of generated definitions, when compared to GPT-4 with just the context of the article’s abstract. This finding leads us to draw out more questions that are worth considering in the context of using LLMs to support text generation and transformation in the newsroom, including in terms of how to evaluate RAG oriented systems. In addition, our exploration of including expertise in the prompt to identify jargon suggests a potentially valuable pattern for journalists seeking to emulate this experience in their own workflow: providing a natural language description of expertise and knowledge in the domain can help steer the model to be more helpful in identifying jargon.

Ultimately, the utility of such a prototype is contingent on how users actually incorporate it into their workflows, and if it actually saves time and improves readers’ comprehension in practice. We would love to know if you have tested out such RAG-based tools in your own newsroom, and how they have been perceived, received, and used!

Sachita Nishal is a Ph.D. student in human-computer interaction and AI at Northwestern University. Eric Lee, an undergraduate computer science student at Northwestern, also contributed to this article. This piece originally ran on Generative AI in the Newsroom.

Illustration by Anton Grabolle used under a Creative Commons license.

Our dataset and web-app are publicly available to support others interested in experimenting with our approach.
Specifically, we used GPT-4 Turbo in May 2024.

Moscow.media

Частные объявления сегодня

Добавить объявление

Владивосток

DAEWOO NOVUS CH7CA Автовышка 45м HORYONG SKY450 2024г

Ахтубинск

Анонимная наркология в Ахтубинске

Владивосток

Автобетононасос KCP52ZX6170 на шасси DAEWOO NOVUS CR9C8 НОВЫЙ

Rss.plus

Все новости за 24 часа

Ru24.pro

Филиал № 4 ОСФР по Москве и Московской области информирует: Родители 240,5 тыс. детей в Московской области получают единое пособие

Филиал № 4 ОСФР по Москве и Московской области информирует: За полгода 14,9 тысячи жителей Московского региона оформили страховую пенсию в автоматическом режиме на портале госуслуг

Филиал № 4 ОСФР по Москве и Московской области информирует: Более 12 тысяч жителей Москвы и Московской области получают повышенную пенсию за работу в сельском хозяйстве

Обложка песни. Обложки альбомов песен. Сделать обложку для песни.

Life24.pro

Elie Saab haute couture осень-зима 2024

Гастроэнтеролог Садыков дал 3 совета, как не отравиться дыней и арбузом

Victoria`s Secret показал кампанию новой коллекции Dream

Сеть клиник «Будь Здоров» займется разработкой инициатив по укреплению здоровья работающего населения

Today24.pro

Technology’s grip on modern life is pushing us down a dimly lit path of digital land mines

Meet Adam Peaty, British swimming hero and three-time Olympic champ returning for Paris 2024 after extended break

Meet Shelly-Ann Fraser-Pryce, Olympic legend and Netflix Sprint athlete chasing gold aged 37 at Paris 2024

Meet Rayssa Leal: 16-year-old ‘Little Fairy’ of skateboarding hoping to go one better at Paris 2024 Olympics

News24.pro

Продолжается реконструкция автодороги Тогот – Курма в Иркутской области

Военные следователи провели рейд по бывшим мигрантам в Феодосии

Заброшенный дом загорелся в третий раз: жители просят ускорить снос

Отечественный автомобиль наиболее популярен в Дагестане: составлен список регионов

Game24.pro

Blizzard politely tells Hearthstone players their game isn't dead just because it's not getting a new cosmetic board this expansion

Activision secretly experimented on 50% of Call of Duty players by 'decreasing' skill-based matchmaking, and determined players like SBMM even if they don't know it

Гайд по регистрации, установке и входу в Throne and Liberty для игроков из России и СНГ

Приключение-головоломка Arranger вышла на смартфоны и PC

Ua24.pro

Касети для розсади: як вони допомагають

Russia24.pro

Отделение СФР по Москве и Московской области проактивно открыло свыше 32 тысяч СНИЛС новорожденным

Кажетта Ахметжанова отдыхает в Якутии и делится местами силы

Филиал № 4 ОСФР по Москве и Московской области информирует: За полгода 14,9 тысячи жителей Московского региона оформили страховую пенсию в автоматическом режиме на портале госуслуг

Филиал № 4 ОСФР по Москве и Московской области информирует: Родители 240,5 тыс. детей в Московской области получают единое пособие

Другие проекты от SMI24.net

News-life

Обложка песни. Обложки альбомов песен. Сделать обложку для песни.

Экономист Ведута: разделение юаня Китаем угрожает экономике РФ

Филиал № 4 ОСФР по Москве и Московской области информирует: Более 12 тысяч жителей Москвы и Московской области получают повышенную пенсию за работу в сельском хозяйстве

Филиал № 4 ОСФР по Москве и Московской области информирует: За полгода 14,9 тысячи жителей Московского региона оформили страховую пенсию в автоматическом режиме на портале госуслуг

Ru24.net

BTL-агентство "МЫ" - промо-модели для выставки

Почему молния может ударить в человека, как этого избежать и у кого есть шанс выжить

Мужчина устроил нарколабораторию в арендованном доме в городском округе Щелково

По всей стране. В «Авито» назвали лучшие регионы для бизнеса в торговле

News.tennis

Главные интриги Олимпиады: кто из звезд спорта приедет в Париж и за кого болеть из России

Теннисистка Веснина показала свою форму на Олимпийские игры в Париже

Мирру Андрееву наградили за первую победу в турнире на уровне WTA

Андреева стала самой молодой чемпионкой турнира WTA за последние три года

29ru.net

Новый детский сад появится в AFI Park «Воронцовский»

CNN: некоторые части Африки и Азии станут непригодными для жизни из-за жары

Итоги вебинара «Сам себе комплектатор: мебель, техника и декор»

Бывший начальник ЦСКА Барышев получил 13 лет строго режима по делу о коррупции

Музыкальные новости

Poisk-music.ru

Адвокат Алсу Крючков не увидел перспектив сохранения брака певицы с Абрамовым

Денис Мацуев представил в Сириусе авторский проект Crescendo

Певица Глюкоза обратилась к журналистам после позора в Красноярске

Менеджер Глеба Самойлова заступился за рокера после провального выступления в Красноярске

Ria.city

Филиал № 4 ОСФР по Москве и Московской области информирует: Родители 240,5 тыс. детей в Московской области получают единое пособие

Нервный импульс. Томские микрохирурги вернули 59-летнему жителю Германии эрекцию

Отделение СФР по Москве и Московской области проактивно открыло свыше 32 тысяч СНИЛС новорожденным

Филиал № 4 ОСФР по Москве и Московской области информирует: За полгода 14,9 тысячи жителей Московского региона оформили страховую пенсию в автоматическом режиме на портале госуслуг

Rss.plus

Адвокат Добровинский: Абрамов не запрещал Алсу пользоваться квартирой в Москве

Удивительный гость из Сербии: развитие проекта Уральской ТПП

Теннисистка Веснина показала свою форму на Олимпийские игры в Париже

В сети выясняют, чем занимался Пресняков до отъезда

Auto.russia24.pro

Что известно о столкновении поезда с КамАЗом в Волгоградской области

Десятикилометровая пробка образовалась на трассе М-7 в Нижегородской области

Новая выделенная полоса заработала в ЮАО

Чуть дешевле 2 млн рублей — представлен новый китайский кроссовер для России

Putin.russia24.pro

В мерах против милитаризации Германии Россия ставит на гиперзвук

Путин наградил орденом Почета однополчанина своего отца в годы Великой Отечественной войны Меркина

Пентагон не стал комментировать заявление Путина об ответе на размещение ракет США в ФРГ

В мерах против Германии Россия ставит на гиперзвук

Health.russia24.pro

Нервный импульс. Томские микрохирурги вернули 59-летнему жителю Германии эрекцию

Сеть клиник «Будь Здоров» займется разработкой инициатив по укреплению здоровья работающего населения

Родилась с пулей. В Москве врачи спасли девочку, которую ранили в утробе

Врач-гигиенист клиники Мегастом Ольга Жидких: почему нельзя языком вытаскивать куски пищи из зубов

Zelensky.russia24.pro

Зеленский заявил, что Киев начнет обсуждение вопросов территориальной целостности

Sport.russia24.pro

Участники челленджа от Инго Экосистемы 4 раза обогнули земной шар

Росгвардейцы обеспечили безопасность на футбольном матче «Динамо» - «Локомотив» в Москве

Модели из Москвы приписывают роман с чемпионом Европы по футболу Ямалем

В ТРЦ «Ривьера» прошла развлекательная программа по мотивам «Команды МАТЧ»

Lukashenko.russia24.pro

Лукашенко завершил визит в Россию и прибыл в Минск

Person.russian.city

Собянин рассказал, как фестиваль «Лето в Москве. Сады и цветы» украсил город

Сергей Собянин рассказал о развитии района Марьина Роща

Собянин: В Москву привезли свыше 500 тыс. растений на фестиваль «Лето в Москве. Сады и цветы»

Сергей Собянин. Главное за день

Ecology.russia24.pro

Какие растения и насекомые опасны для сада и огорода

Участники челленджа от Инго Экосистемы 4 раза обогнули земной шар

Ежемесячную денежную выплату получают более 1,6 млн пострадавших от радиации

Юбилейный «Лётчик Фестиваль» в культурно-экологическом пространстве «Джао Да!Ча»

29ru.net

Итоги вебинара «Сам себе комплектатор: мебель, техника и декор»

Пользователям электросамокатов выписали 95 тысяч штрафов в Москве

А. Новак поручил Минэнерго и ФАС отслеживать динамику розничных цен на топливо и баланс маржинальности переработки

NetEase: Пекин оригинально ответил на призыв США предать Россию

Severodvinsk.ws

В Архангельске завершился XII фестиваль духовых оркестров «Дирекцион-Норд»

Первые заморозки возможны в августе в России

Фестиваль военных духовых оркестров одновременно открылся в трех городах Поморья

Спасатели МЧС РФ нашли пропавшую под Архангельском двухлетнюю девочку

Sevpoisk.ru

Выставка исторической памяти «В гости к нашим далеким предкам» ко Дню Крещения Руси и Дню памяти равноапостольного Великого князя Владимира

Выставка-признание «Человек. Писатель. Актер», к 95-летию со дня рождения В. М. Шукшина, писателя, кинорежиссера, сценариста, актера

К Международному дню тигра саранская телемачта «переоденется» в редкого хищника

Штаб «Все для фронта, все для победы» отправил в зону СВО очередной гуманитарный груз

103news.com

МЧС раскрыло подробности пожара на северо-западе Москвы

ДТП произошло на Новоарбатском мосту в Москве

Ставка против акций // Прогнозы ЦБ закрыли индекс Мосбиржи на годовом минимуме

Экономист Ведута: разделение юаня Китаем угрожает экономике РФ

Агрегатор новостей 24СМИ