Women..Framed!
Women Framed is one of our long-lasting programs. The idea is to build multi-regional datasets that help journalists and researchers understand how women are being framed in relation to violent events—whether they appear as victims or as perpetrators.
We began this work through a collaboration with the Egyptian Observatory for Journalism and Media, producing a mini-dataset that includes around 500 articles published on five Egyptian Arabic digital platforms between November 15, 2024 and December 15, 2024, coinciding with last year’s 16 Days Against Violence campaign. We are now working with additional partners to expand the datasets—covering more data, more countries, and more regions across the Global South.
Stories and products powered by this project will be posted here.
Project Main Information
-
We scraped data directly from five domains: Mada Masr, Al Youm 7, Al Masry Al Youm, Al Wafd, and Al Ahram. After collecting the raw data, we filtered the news stories using a combination of machine-learning algorithms and human review. This process narrowed more than 100,000 articles down to 455 news stories specifically related to violent crimes either committed against women or perpetrated by women.
-
The analysis used Association Rule Mining (ARM) via the Apriori algorithm, applied separately to headlines and full article bodies to account for their different linguistic properties. Headlines were processed with moderate thresholds to allow meaningful but concise co-occurrence patterns to emerge, while body texts—larger and more lexically diverse—were analyzed using higher support values (5–7%) and a confidence threshold of 20% to isolate only stable, recurring associations. A multi-stage filtering process removed rules dominated by stopwords, and lift values were used to distinguish substantive patterns from random co-occurrences. Each retained rule was further assigned a topic label based on curated keyword sets, with unmatched cases categorized as “Other,” ensuring that the final outputs reflect interpretable and content-driven linguistic structures.
-
Yes, the dataset is accessible to all journalists, researchers, and activists interested in the topic, however, we prefer that personnel who is interested in accessing the dataset, to take a workshop with us, to learn what they can about ARM analysis and Apriori, before start mining in our datasets. It’s not mandatory, but we strongly encourage it. If you are interested, please reach out via email: contact@anmat.media, or sign up using this form: https://forms.gle/HQjY379aEyYJt3J7A
-
The datasets is available on Kaggle, you will be able to access it from here, or from Anmat account on Kaggle

