The next speaker in this session at the 2026 International Communication Association conference in Cape Town is Rupert Kiddle, whose interest is in encoder-produced news embeddings. This is an increasingly common technique, which helps analyse and categorise news articles both for internal journalistic purposes and for scholarly research. But they are not very sensitive to differences over time, and instead engage in a kind of temporal averaging of embeddings; this can be addressed, but remains difficult.
Most models also remain intransparent about their training data and weighting approaches, so there is a need to develop new approaches. This project draws on GDELT for its data; uses the Nomic Contrastors repository for its fully open embedding pipeline; and implements a training task called NewsCycle which engages in temporally sensitive embedding processes.
GDELT is used to collect English-language news from 2020 to 2025, selecting some 2,200 articles per day and engaging in a range of filtering steps. The result of the NewsCycle task can then be tested by querying for specific content at particular points in time while hiding the timestamps of articles from the retrieval system; ideally, it will still select for appropriate articles and reject those which represent news content from different timeframes, based only on the temporally distinct embeddings it has generated.
To date this is English- and text-only, but can be extended further; additional evaluation for a broader range of tasks beyond document retrieval is also necessary.











