(The package of course works equally well with text collections that cannot be freely shared, e.g. a collection of newspaper articles etc. A third, somewhat bigger (but less carefully prepared) test corpus (26,654 documents) from the same website is available here. The equivalent material in Russian (10,221 documents) can be downloaded here (but see the note on character encoding below). An example corpus can be downloaded for testing here – this is a “corporaexplorerobject” based on a document collection scraped from the Russian governmental website (7751 documents, more info here).
Text encoding explorer license#
Many text collections can be freely shared, e.g. documents from governmental websites (such documents are usually equipped with a Creative Commons license or similar). run_document_extractor() runs a Shiny app for simple retrieval/extraction of documents from a “corporaexplorerobject” in a reading-friendly format.explore() runs the package’s core feature, a Shiny app for fast and flexible exploration of a “corporaexplorerobject”.prepare_data() converts a data frame or character vector to a “corporaexplorerobject”.Document retrieval: Extraction of subsets of the corpus in a format suitable for close reading.Ĭorporaexplorer contains three main functions:.Document visualisation and display: Easy navigation to and within full-text documents with pattern matches highlighted.Corpus visualisation: An interactive heat-map of the corpus, based on the search input (calendar heat-map or heat-map where each tile represents one document, optionally grouped by metadata properties).Input: The ability to filter the corpus and/or highlight documents, based on search patterns (in main text or metadata, including date range).The corporaexplorer concept is simple, yet has the potential to enhance research work with text collections in a powerful way. Thus, the aim is for the package to be useful for anyone with a rudimentary knowledge of R – or with collaborators who have such knowledge. While collecting and preparing the text collections to be explored requires some familiarity with R programming, using the Shiny apps for exploring and extracting documents from the corpus should be fairly intuitive also for those with no programming knowledge, once the apps have been set up by a collaborator. Finally, by offering a convenient way to explore any character vector, it can also be useful for a wide range of other R users. One colleague said sometimes he notices weird characters as soon as he loads the page, but itĬorrects itself to the proper page display after a second or two.Corporaexplorer is an R package that uses the Shiny graphical user interface framework for dynamic exploration of text collections.Ĭorporaexplorer is designed for use with a wide range of text collections one example could be a collection of tens of thousands of documents scraped from a governmental website another example could be the collected works of a novelist a third example could be the chapters of a single book.Ĭorporaexplorer’s intended primary audience are qualitatively oriented researchers who rely on close reading of textual documents as part of their academic activity, but the package should also be a useful supplement for those doing quantitative textual research and wishing to visit the texts under study. Has anyone had similar issues before or have any idea what could be causing this? My colleagues all have my exact same setup and none of them are having issues.
![text encoding explorer text encoding explorer](https://images.slideplayer.com/33/8232506/slides/slide_2.jpg)
There is even a link on the page that is a " ) " instead of the correct word it should It just seems like the browser is displaying random characters instread of what it should be. Instead show "w" and "j" for some odd reason. Options beside it such as "clear all" and "Grid on/off" However, my "filter" button shows a "F" as the link. I have not noticed the issue on most web pages, however for a vital web page that we use to conduct business, it shows random characters instead of the text that should be there for the link.Įxample: There is a tool bar that has options such as clicking "filter" to filter search results. Incorrect characters on most of the links and buttons of the page.
Text encoding explorer upgrade#
Prior to the upgrade to Windows 10 I had no issues, however after the upgrade a specific web page I use for work is displaying We recently upgraded to Windows 10, and I am currently running Internet Explorer 11.
![text encoding explorer text encoding explorer](http://muhammadanism.org/Unicode/UnicodeIEOptions.gif)
![text encoding explorer text encoding explorer](https://mcmw.abilitynet.org.uk/media/public/2021/09/mcmw_Internet_Explorer_For_Windows_Making_Text_Larger_fig1.png)
This issue is happening on my work computer.