登录查看更多内容

Standardization of Wikipedia articles according to the lexical constancy of their introductions and body texts

Ludovic BOCKEN, PhDs (c) - INTP-T

Innovation, Governance, Enterprise Architecture, (Generative) Artificial Intelligence, Knowledge Engineering Specialist | Drummer

发布日期: 2019年9月12日

Wikipedia is a prolific encyclopedia of unequal quality. To standardize it, qualitative categories classify certain articles. The length of the articles has been identified as one of the best predictor variables but is an insufficient criterion to standardize the cognitive accessibility of Wikipedia. This research measures the constancy of repetition of the vocabulary between introductions and the body of articles. Our reproducible methodology is largely inspired by supervised classification methods and similarity metrics. The programming interface of Wikipedia and the quanteda software library are exploited to collect and measure two quality categories: the positive category of the featured articles and the negative category of the articles needing rewrite. After idempotence tests, the K and Vm metrics are selected and applied to the texts. A complementary measure is formalized as the difference relating to independent measurements. Models of combinatorial properties are then evaluated. Decision trees give an overview. The performance of aggregated models of each metric is then compared from support vector machines (SVMs). The K and Vm metrics appear as better candidates than the length one to normalize Wikipedia but the metric K appears to be more discriminating.

Please let me know if you are interested in this (kind of) research and/or if you need a communication about it. Thanks for sharing !

Ludovic BOCKEN, PhDs (c) - INTP-T

Innovation, Governance, Enterprise Architecture, (Generative) Artificial Intelligence, Knowledge Engineering Specialist | Drummer

5 年

You could be interested in this :?https://www.dhirubhai.net/pulse/wikim-r-package-measure-wikipedia-ludovic-bocken-phds-c- :)

Nina Khairova

Prof. Dr. in Computational Linguistics

5 年

Hi Ludovic! Where could I see the whole article?

Ali Salhi

Language Processing, Information Retrieval, Robotics, Data!

5 年

Hi Ludovic, Have you checked my paper about the "Arabic" Wikipedia??https://ieeexplore.ieee.org/document/6987558 If you don't have access please let me know.?

Daniel Kinzler

Principal Software Engineer at Wikimedia Foundation

5 年

Hi Lodovic! Are you familiar with the ORES project? https://www.mediawiki.org/wiki/ORES Wikimedia is very interested in automated quality assessment, especially for detecting vandalism, but also for surfacing more subtle problems. Aaron Halfaker is the research scientist on the project.

1 次回应

Graeme Wood

Chief Customer Officer at CarbonCatalyst

5 年

Interesting view point. We are trying to create a Wikipedia site at the moment . Quite hard.

1 次回应

查看更多评论

要查看或添加评论，请登录

Ludovic BOCKEN, PhDs (c) - INTP-T的更多文章

wikim : An R package to measure Wikipedia

2020年2月11日

wikim : An R package to measure Wikipedia

Since its creation in 2001, Wikipedia became a precious encyclopedic, ontological and textual resource. Wikipedia can…
Wikipedia + Science = :) (English)

2015年7月29日

Wikipedia + Science = :) (English)

Hello, I'm conducting research about the help of Wikipedia in the comprehension of difficult texts. If you are…

3 条评论
Wikipedia + Science = :) (French)

2015年7月29日

Wikipedia + Science = :) (French)

Bonjour, Je suis à la recherche de lecteurs motivés pour une recherche sur l’aide apportée par Wikipédia dans la…

4 条评论
Comment Wikipédia peut vous aider à comprendre un texte difficile ? (Projet de thèse)

2015年7月29日

Comment Wikipédia peut vous aider à comprendre un texte difficile ? (Projet de thèse)

De plus en plus de connaissances sont disponibles sur le web sous forme de textes. La bibliothèque libre de la science…

1 条评论

Standardization of Wikipedia articles according to the lexical constancy of their introductions and body texts

Ludovic BOCKEN, PhDs (c) - INTP-T

Innovation, Governance, Enterprise Architecture, (Generative) Artificial Intelligence, Knowledge Engineering Specialist | Drummer

Ludovic BOCKEN, PhDs (c) - INTP-T的更多文章

社区洞察

其他会员也浏览了

RFO KNOWLEDGE BRAINSTORMY_1-4-24_56th Edition

Factor analysis.

The Mathematics of Design: What Typing Monkeys Reveal About Life's Origins

RFO KNOWLEDGE BRAINSTORMY_25-8-23_23rd Edition

RFO KNOWLEDGE BRAINSTORMY_24-2-24_44th Edition

RFO KNOWLEDGE BRAINSTORMY_1_9_23_30th_Edition

RFO KNOWLEDGE BRAINSTORMY_22-8-23_20th Edition

RFO KNOWLEDGE BRAINSTORMY_28-8-23_26th Edition

RFO KNOWLEDGE BRAINSTORMY_21-8-23_19th Edition

RFO KNOWLEDGE BRAINSTORMY_22-2-24_42th Edition

Ludovic BOCKEN, PhDs (c) - INTP-T的更多文章

wikim : An R package to measure Wikipedia

Wikipedia + Science = :) (English)

Wikipedia + Science = :) (French)

Comment Wikipédia peut vous aider à comprendre un texte difficile ? (Projet de thèse)

社区洞察

其他会员也浏览了

RFO KNOWLEDGE BRAINSTORMY_1-4-24_56th Edition

Factor analysis.

The Mathematics of Design: What Typing Monkeys Reveal About Life's Origins

RFO KNOWLEDGE BRAINSTORMY_25-8-23_23rd Edition

RFO KNOWLEDGE BRAINSTORMY_24-2-24_44th Edition

RFO KNOWLEDGE BRAINSTORMY_1_9_23_30th_Edition

RFO KNOWLEDGE BRAINSTORMY_22-8-23_20th Edition

RFO KNOWLEDGE BRAINSTORMY_28-8-23_26th Edition

RFO KNOWLEDGE BRAINSTORMY_21-8-23_19th Edition

RFO KNOWLEDGE BRAINSTORMY_22-2-24_42th Edition