WordStat by Provalis Research

WordStat Features

Quickly extract valuable insights from unstructured data

With WordStat, Data Analysts can quickly extract valuable text analytics results from large collections of documents such as customer feedback, emails, open-ended responses, interview transcripts, incident reports, patents, legal documents, blogs, websites, and more.

Import from many sources

Create projects from more data sources:

Import from documents and reference tools

  • Documents: Word, PDF, HTML, PowerPoint, RTF, TXT, XPS, ePUB, ODT, WordPerfect.
  • Data files: Excel, CSV, TSV, Access, SPSS and Stata
  • Email platforms: Outlook, Gmail and Mbox
  • Reference management tools: Endnote, Zotero, Mendeley

Import from online resources

  • Social media services: Twitter, Facebook, Reddit, RSS Feeds, Youtube
  • Web survey platforms: Qualtrics, SurveyMonkey, SurveyGizmo, QuestionPro, Voxco, triple-s
  • Qualitative software: NVivo, Atlas.ti, Qdpx files

Import graphics

  • Graphic types: BMP, WMF, JPG, GIF, PNG.
  • Automatically extract any information associated with those images such as geographic location, title, description, authors, comments, etc. and transform those into variables

Import multiple languages

  • Single-byte languages e.g. English, French, Spanish, German and many more
  • Double-byte languages Chinese, Japanese, Thai
  • Left to right and right to left languages

Import from a folder

  • Monitor a specific folder, and automatically import any documents and images stored in this folder
  • Monitor changes to the original source file or online services.

Organize your data

Several features allow you to easily organize your data in ways that make your analysis process straightforward:

  • Quickly group, label, sort, add, delete documents or find duplicates.
  • Assign variables to your documents manually or automatically using the Document Conversion Wizard, ie: date, author, or demographic data such as age, gender, or location.
  • Easily reorder, add, delete, edit, and recode variables.
  • Filter cases based on variable values.

Quickly extract meaning using Explorer Mode

Explorer mode

Quickly and easily extract meaning from large amounts of text data using Explorer mode, specially made for those with little text mining experience.

Identify the most frequent words, phrases, and extract the most salient topics in your documents with the topic modeling tool. At any time, you can switch to Expert mode which gives you access to all WordStat’s features.

Explore document content using Text Mining

In a few seconds, explore the content of large amounts of unstructured data and extract insightful information:

  • Extract the most frequent words, phrases, expressions.
  • Quickly extract themes using clustering or 2D and 3D multidimensional scaling on either words or phrases.
  • Easily identify all keywords that co-occur with a target keyword by using the Proximity Plot.
  • Explore relationships among words or concepts with the Link Analysis feature.
  • Fine-tune the analysis by applying the keyword co-occurrence criterion (within a case, a sentence, a paragraph, a window of n words, a user-defined segment) as well as clustering methods (first and second-order proximity, choice of similarity measures).
  • Explore the similarity between concepts or documents using hierarchical clustering, multidimensional scaling, link analysis, and proximity plot.

Exloring document content

Use Topic Modelling to extract the most salient topics

Topic Modelling

Get a quick overview of the most salient topics from very large text collections using state-of-the-art automatic topic extraction by applying a combination of natural language processing and statistical analysis (NNMF or factor analysis) not only on words but also on phrases and related words (including misspellings).

While in hierarchical cluster analysis, a word may only appear in one cluster, topic modeling may result in a word being associated with more than one topic, a characteristic that more realistically represents the polysemous nature of some words as well as the multiplicity of contexts of word usages.

Explore connections

Explore connections among words or concepts using a network graph. Detect underlying patterns and structures of co-occurrences using three layout types: multidimensional scaling, a force-based graph, and a circular layout.

Graphs are interactive and may be used to explore relationships and to retrieve text segments associated with specific connections.

Explore connections

Relate text with structured data

Relate text with structured data

Explore relationships between unstructured text and structured data:

  • Identify temporal trends, differences between subgroups, or assess relationships with ratings or other kinds of categorical or numerical data with statistical and graphical tools (deviation table, correspondence analysis, heatmaps, bubble charts, etc.).
  • Assess the relationship between word occurrence and nominal or ordinal variables using different association measures: Chi-square, Likelihood ratio, Tau-a, Tau-b, Tau-c, symmetric Somers’ D, asymmetric Somers’ Dxy and Dyx, Gamma, Person’s R, Spearman’s Rho.

Categorize your text data using dictionaries

Achieve full-text analysis automation using existing dictionaries or create your own categorization model of words and phrases.

In the dictionary, one can implement Boolean (AND, OR, NOT) and proximity rules (NEAR, AFTER, BEFORE) and use Regular Expression formulas to quickly extract specific information from text data.

Dictionary moderated lemmatization and stemming are available in several languages and an automatic word substitution option allows you to substitute several words with a target keyword. A user-defined list of stop words is available in several languages to avoid nonessential frequent words such as he, she, it, etc in the analysis.

Categorize your text data using dictionaries

Get unique assistance for dictionary building

Get unique assistance for dictionary building

Get truly unique computer assistance for taxonomy building with tools for extracting common phrases and technical terms and for quickly identifying in your text collection misspellings and related words (synonyms, antonyms, holonyms, meronyms, hypernyms, hyponyms).

Automatically classify your text data using machine learning

Develop and optimize automatic document classification models using Naïve Bayes and K-Nearest Neighbours. There are numerous validation methods that users can select: leave-but-one, n-fold cross-validation, split sample. An experimentation module can be used to easily compare predictive models and fine-tune classification models.

Classification models may be saved to disk and applied later in QDA Miner, in a standalone document classification utility program, a command-line program or a programming library.

Automatically classify your text data using machine learning

Return to the source document in one click

Return to the source document in one click

Verify or dig deeper into your analysis by going back to the text from almost any feature, chart, or graph using Keyword Retrieval or Keyword-in-Context to retrieve sentences, paragraphs, or whole documents. This is particularly helpful when building taxonomies or for word-sense disambiguation.

The retrieved text segments can be sorted by keyword or any independent variable. You can attach QDA Miner codes to retrieved segments or export them to disk in tabular format (Excel, CSV, etc.) or as text reports (MS Word, RTF, etc.).

Perform qualitative coding

Combine WordStat with a state-of-the-art qualitative coding tool (QDA Miner), for more precise exploration of data or a more in-depth analysis of specific documents or extracted text segments when needed.

Perform qualitative coding

Transform unstructured text into interactive maps (GIS mapping)

Transform unstructured text into interactive maps (GIS mapping)

Relate unstructured text data with geographic information and create interactive plots of data points, thematic maps, and heatmaps, along with a geocoding web service for transforming location names, postal codes and IP addresses into latitude and longitudes.

Automatically extract names and misspellings

Automatically extract named entities (names, technical terms, product and company names) that can be added to the categorization dictionary using an easy drag-and-drop-operation.

Misspellings and unknown words are automatically extracted and matched with existing entries in the user dictionary and may be quickly added to the dictionary.

Automatically extract names and misspellings

Export results

Export results

Export text analysis results to common industry file formats such as Excel, ASCII, HTML, XML, MS Word, to popular statistical analysis tools such as SPSS and Stata and to graphs such as PNG, BMP, and JPEG.

Transform text using Python scripts

Use Python script and its full range of open-source libraries to preprocess or transform text documents for analysis in WordStat.

Transform text using Python scripts

Content analysis and text mining software for fast and precise processing of large amounts of unstructured information.