Social scientists contemporarily explore sophisticated text-mining tools for big data analysis. One class of tools attracting considerable attention is named entity recognizers, which provide the ability to detect social actors and classify them as persons and organizations. However, it remains a technical challenge to automatically disambiguate (who is referred to in a text?) and specify (which demographic characteristics are present?) social actors. JEnExtrA is a reliable and accurate software architecture for social scientists who are interested in automatically detecting, disambiguating, and demographically specifying social actors in big data. The software architecture utilizes the online encyclopedia Wikipedia.
For a more detailed description of the software application please refer to our article in which you also find an demonstration example:
Poschmann, Philipp & Goldenstein, Jan (2019): Disambiguating and Specifying Social Actors in Big Data: Using Wikipedia as a Data Source for Demographic InformationExternal link, Sociological Methods & Research 51(2), 887-925..
If you apply the software application to your own research, please cite our article.