2021


Version française

SimpleText Workshop – GDR IA (2021)

Simplification and Popularization of Scientific Texts

Organizers

  • Liana Ermakova, HCTI, Univ. Bretagne Occidentale
  • Josiane Mothe, IRIT, INS2i
  • Eric Sanjuan, LIA, INS2i

  • Executive Committee: Frédéric Bimbot

Themes

  • Natural Language Processing, Research, Scientific Journalism, Science Popularization

Data Used

  • Scientific Articles, Scientific Articles Abstracts, Wikipedia Articles, Science Journalism Articles

Scientific Context

Scientific literacy, including health related questions, is important for people to make right decisions, evaluate the information quality, maintain physiological and mental health, avoid spending money on useless items.

For example, the stories the individuals find credible can determine their response to the COVID-19 pandemic, including the application of social distancing, using dangerous fake medical treatments, or hoarding. Unfortunately, stories in social media are easier for lay people to understand than the research papers. Scientific texts such as scientific publications can also be difficult to understand for non domain-experts or scientists outside the publication domain.

Improving text comprehensibility and its adaptation to different audience remains an unresolved problem. Despite the existence of some dataset such as WebSplit and WikiSplit, automated text simplification is reduced to the task of “Split and Rephrase” (Aharoni and Goldberg, 2018; Botha et al., 2018; Narayan et al., 2017). Another existing dataset is based on Simple Wikipedia (Coster and Kauchak, 2011). Although there are some attempts to tackle the issue of text comprehensibility, they are mainly based on readability formulas, which are not convincingly demonstrated the ability to reduce the difficulty of text (Collins-Thompson and Callan, 2004; Leroy et al., 2013; Flesch, 1948; Gunning, 1968; Si and Callan, 2001). Recent studies use Transformers models (BERT) to simplify sentences (Fang and Stevens, 2019; Maruyama and Yamamoto, 2019; Zhao et al., 2018). Unlike previous works, this SimpleText workshop aims at the lack of knowledge that can be a great impediment to the understanding of the scientific text (O’Reilly et al., 2019).

Improving text comprehensibility and its adaptation to different audience bring societal, technical, and evaluation challenges.

There is a large range of important societal challenges SimpleText is linked to. Open science is one of them. Making the research really open and accessible for everyone implies providing it in a form that can be readable and understandable (Fecher and Friesike, 2014).

SimpleText also tackles technical challenges related to data (passage) selection and summarization, comprehensibility and readability of texts. SimpleText aims at a panorama of its technical challenges by mobilizing different relevant scientific disciplines.

The use, usability and evaluation of simplified texts is another part of the topics to be covered in the SimpleText workshop.

Scientific popularization is linked to scientific journalism. Unlike the MaDICS MADONA action (Maîtriser l’Analyse interactive de DOnnées pour la NArration journalistique) - which aims at article creation based on structured data analysis, the SimpleText workshop is aiming at article creation based on textual data (scientific publications and their abstracts).

Bibliography

  • Aharoni, R., Goldberg, Y., 2018. Split and Rephrase: Better Evaluation and a Stronger Baseline. ArXiv180501035 Cs.
  • Botha, J.A., Faruqui, M., Alex, J., Baldridge, J., Das, D., 2018. Learning To Split and Rephrase From Wikipedia Edit History, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 732–737. https://doi.org/10.18653/v1/D18-1080
  • Collins-Thompson, K., Callan, J., 2004. A Language Modeling Approach to Predicting Reading Difficulty. Proc. HLTNAACL 4.
  • Coster, W., Kauchak, D., 2011. Simple English Wikipedia: a new text simplification task, in: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. pp. 665–669.
  • Fang, F., Stevens, M., 2019. Sentence Simplification with Transformer-XL and Paraphrase Rules 10.
  • Fecher, B., Friesike, S., 2014. Open science: one term, five schools of thought, in: Opening Science. Springer, Cham, pp. 17–47.
  • Flesch, R., 1948. A new readability yardstick. J. Appl. Psychol. 32, p221-233.
  • Gunning, R., 1968. The technique of clear writing. McGraw-Hill.
  • Hasler, E., de Gispert, A., Stahlberg, F., Waite, A., Byrne, B., 2017. Source sentence simplification for statistical machine translation. Comput. Speech Lang. 45, 221–235. https://doi.org/10.1016/j.csl.2016.12.001
  • Inui, K., Fujita, A., Takahashi, T., Iida, R., Iwakura, T., 2003. Text simplification for reading assistance: a project note, in: Proceedings of the Second International Workshop on Paraphrasing - Volume 16, PARAPHRASE ’03. Association for Computational Linguistics, USA, pp. 9–16. https://doi.org/10.3115/1118984.1118986
  • Leroy, G., Endicott, J.E., Kauchak, D., Mouradi, O., Just, M., 2013. User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Med. Internet Res. 15, e144.
  • Maruyama, T., Yamamoto, K., 2019. Extremely Low Resource Text simplification with Pre-trained Transformer Language Model. Int. Conf. Asian Lang. Process. 6.
  • Narayan, S., Gardent, C., Cohen, S.B., Shimorina, A., 2017. Split and Rephrase, in: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2017, Association for Computational Linguistics, Copenhagen, Denmark, pp. 606–616. https://doi.org/10.18653/v1/D17-1064
  • O’Reilly, T., Wang, Z., Sabatini, J., 2019. How Much Knowledge Is Too Little? When a Lack of Knowledge Becomes a Barrier to Comprehension: Psychol. Sci. https://doi.org/10.1177/0956797619862276
  • Si, L., Callan, J., 2001. A statistical model for scientific readability. Proc. Tenth Int. Conf. Inf. Knowl. Manag. 574–576.
  • Siddharthan, A., 2002. An architecture for a text simplification system. Presented at the Proceedings of the Language Engineering Conference 2002 (LEC 2002).
  • Zhao, S., Meng, R., He, D., Saptono, A., Parmanto, B., 2018. Integrating Transformer and Paraphrase Rules for Sentence Simplification, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Presented at the EMNLP 2018, Association for Computational Linguistics, Brussels, Belgium, pp. 3164–3173. https://doi.org/10.18653/v1/D18-1355

Goals of the Workshop and main activities planned

Our main goal is to gather an interdisciplinary scientific community around these subjects. We want to give the most complete definition of simplification and popularization of scientific texts. We plan to organize study days to discuss the challenges linked to simplification and popularization of scientific texts: Societal Linguistic Evaluation Techniques Simultaneously, our goal is to gather this community to tackle challenges such as an evaluation campaign on the simplification of scientific texts. We are considering three different tasks: Study of basic knowledge Automated abstract creation of several documents on a subject Scientific text simplification This activity will be done by multidisciplinary teams, by using for example scientific publications taken out of ISTEX platform (https://www.istex.fr/) - which provide its services through API Web form - or other collections that the organizers will suggest to the community. Access to all of API resources without prior authorization request is allowed for research activities. We also consider using Wikipedia articles and scientific journalism articles.