Stata 19 - Statistical Software New Version

Machine learning via H2O: Ensemble decision trees

Machine learning methods are often used to solve research and business problems focused on prediction when the problems require more advanced modeling than linear or generalized linear models. Ensemble decision tree methods, which combine multiple trees for better predictions, are popular for such tasks. H2O is a scalable machine learning platform that supports data analysis and machine learning, including ensemble decision tree methods such as random forest and gradient boosting machine (GBM).

The new h2oml suite of Stata commands is a wrapper for H2O that provides end-to-end support for H2O machine learning analysis using ensemble decision tree methods. After using the h2o commands to initiate or connect to an existing H2O cluster, you can use the h2oml commands to perform GBM and random forest for regression and classification problems. The h2oml suite offers tools for hyperparameter tuning, validation, cross-validation, evaluating model performance, obtaining predictions, and explaining these predictions. For example,

Initiate H2O from within Stata

. h20 init

Import data from Stata into H2O

. _h20frame put, into(dataframe) current

Perform gradient boosting binary classification, and tune the number of trees and hyperparameters

. h2oml gbbinclass response predictors, ntrees(20(10)200) lrate(0.1(0.1)1)

Assess variable importance

. h2omlgraph varimp

Make predictions

. _h2oframe change newdata

. h20mlpredict outcome_pred

And there’s much more.

Do-file Editor: Autocompletion, templates, and more

The Do-file Editor has the following additions:

Autocompletion of variable names, macros, and stored results. If you pause briefly as you type, suggestions of variable names from data in memory, macros, and stored results will appear in addition to the command names and existing words that appeared previously.

Do-file Editor templates. You can now save time and ensure consistency when you create new documents in the Do-file Editor by using Stata templates and user-defined templates.

Do-file Editor current word and selection highlighting. The Do-file Editor will now highlight all case-insensitive occurrences of the current word under the cursor and all case-sensitive occurrences of the current selection.

Bracket highlighting. The Do-file Editor will now highlight the brackets enclosing the current cursor position as you move through the document.

Code folding enhancements. You can now quickly fold all foldable blocks of code in your do-file by using the Fold all menu item. You can then selectively unfold your code one fold point at a time to show the more important parts of your do-file, or you can use the Do-file Editor’s Unfold all menu item to unfold every fold point. You can also select lines of code and transform them into a foldable block of code by using the Fold selection menu item. This can tidy up your code and increase the code’s readability. In addition, the code-folding feature has been changed to be less visually distracting by using arrow markers in the code-folding ribbon to indicate whether a code fold is expanded or collapsed and to hide expanded code-fold markers unless the user hovers the mouse over the code-folding ribbon.

Do-file Editor temporary and permanent bookmarks. The Do-file Editor now supports temporary bookmarks in addition to permanent bookmarks. The existing permanent bookmarks are saved as part of the do-file. You can use the new temporary bookmarks to immediately navigate your do-file but without making any changes to its content.

Show whitespace and tabs. The Do-file Editor can now show whitespace characters only within a selection instead of always showing them or not showing them at all.

Navigator panel. The Navigation control from previous releases of Stata has been replaced by the Navigator panel. It displays a list of permanent bookmarks and programs that are in a do-file. You can quickly jump to the position of a program or bookmark by double-clicking on the item in the Navigator panel. You can also delete and indent bookmarks from the Navigator panel.

Tables: Easier tabulations, exporting, and more

Stata 19 also includes many additions that allow users to more easily create and customize tables.

Titles, notes, and exporting for tables. The table command is a flexible tool for creating tabulations, tables of summary statistics, tables of regression results, and more. It now allows you to add a title with the new title() option, to add a note with the new note() option, to control the appearance of the title and notes with the new titlestyles() and notestyles() options, and to export your table to your preferred document type (Word, LaTeX, Excel, etc.) with the new export() option.

Easier ANOVA tables. You can now more easily create and customize ANOVA tables after anova and oneway by collecting the new stored matrix r(ANOVA). You can use the new anova collection style to easily format these results in a standard ANOVA-style layout.

Better labels with collect get. With command collect get's new option commands(), you can specify the command names that posted the results being consumed. This allows collect get to search for command-specific result labels. The results in the collection will often have better labels, as they would if the collect prefix were used instead of the collect get command.

Determine layout of a collection. The new collect query layout command allows you to query a collection's layout specification. Previously, users typed collect layout to display both the layout and the table. Now you no longer need to see the full table each time you want to see the layout.

Control factor variables in headers. With collect style header's new option fvlevels(), you have more control over how factor variables appear in a table. Specify whether to hide or show factor-variable levels in row and column headers.

Remove results from a collection. The new collect unget command allows you to remove selected results from a collection. This can make it easier to lay out tables that do not involve these results.

Table-specific notes. The collect notes command has the new fortags() option that allows you to control which table should show the specified note.

Tabulations with measures of association and tests. You can now easily create customized tables with the results of tabulate and svy: tabulate. With the new collect() option, tabulated statistics are stored in a collection with its own layout and styles, which can be further customized and exported to a variety of file types. This is particularly useful when you wish to include cumulative percentages or measures of association from tabulate and tests from svy: tabulate.

List of New Features of Stata 19

Machine learning via H2O: Ensemble decision trees

Machine learning via H2O: Ensemble decision trees

Conditional average treatment effects (CATE)

Conditional average treatment effects (CATE)

High-dimensional fixed effects (HDFE)

High-dimensional fixed effects (HDFE)

Bayesian variable selection for linear regression

Bayesian variable selection for linear regression

Marginal Cox PH models for interval-censored multiple events data

Marginal Cox PH models for interval-censored multiple events data

Meta-analysis for correlations

Meta-analysis for correlations

Correlated random-effects (CRE) model

Correlated random-effects (CRE) model

Panel-data vector autoregressive (VAR) model

Panel-data vector autoregressive (VAR) model

Bayesian bootstrap and replicate weights

Bayesian bootstrap and replicate weights

Control-function linear and probit models

Control-function linear and probit models

Bayesian quantile regression via asymmetric Laplace likelihood

Bayesian quantile regression via asymmetric Laplace likelihood

Inference robust to weak instruments

Inference robust to weak instruments

SVAR models via instrumental variables

SVAR models via instrumental variables

Instrumental-variables local-projection IRFs

Instrumental-variables local-projection IRFs

Mundlak specification test

Mundlak specification test

Latent class model-comparison statistics

Latent class model-comparison statistics

Do-file Editor: Autocompletion, templates, and more

Do-file Editor: Autocompletion, templates, and more

Graphics: Bar graph CIs, heat maps, and more

Graphics: Bar graph CIs, heat maps, and more

Tables: Easier tabulations, exporting, and more

Tables: Easier tabulations, exporting, and more

Stata in French

Stata in French

And More

And More