Monday, February 4, 2013

Stata and $\LaTeX$: Descriptive Statistics

For as long as I've been statisticulating, I've sought to (seamlessly) export results from the statistical software program into a file-format readable (and usable) by non-statisticians.  In my earliest days --- an undergraduate econometrics course using SAS --- I just copied-and-pasted the results from the SAS output window into a MS Word document, fixed the formatting and spacing, then saved the Word file and left it at that.  This method was dreadfully inefficient and prone to error but something I came to view as a necessary (if not wholly enjoyable) part of the statistical analysis process.  Fortunately, the days of copy-and-paste are getting further and further behind us.

In SAS, they've developed a rich and expansive Output Delivery System (ODS) that can output virtually any SAS output into an RTF or PDF file.  I used SAS ODS a fair amount in my last job and don't recall having any major beefs with it.  With Stata, however, there isn't any corporate-developed output delivery system that outputs results into, say, MS Word or Adobe Acrobat.  You can create a "log file" in Stata that logs all your output (sans graphs) and commands into a Stata-proprietary format (.SMCL) or into a text file (.TXT) but since this log also includes the commands and comments used to generate the results, it isn't ideal for sending to a non-statistician.  In spite of this limitation, though, I (for a time) was converting my .SMCL files into HTML documents and sending the HTML document (liberally commented to make the document somewhat self-explanatory) when results needed to be circulated.  This was acceptable but not completely ideal.  It wasn't until I started using $\LaTeX$ that outputting of the results directly to Adobe started to make more sense.  But even more ideally, I wanted to write and create $\LaTeX$ code directly from within Stata with the code being written directly to a text or $\LaTeX$ file.  Although I'm still working through the best way to do this, I think what I have so far is a decent start.  

First, I open a $\LaTeX$ file (WinEdt actually) and include everything (e.g. preamble) up until the first \section{...} statement.  
 
\documentclass[11pt]{article}
\newcommand{\bs}{$\backslash$}
\thispagestyle{empty}

%Packages
\usepackage[letterpaper,left=2.5cm,right=2.5cm,top=2.5cm,bottom=2.5cm]{geometry}
\usepackage{booktabs,tabularx,epsfig,graphicx,epstopdf,pdflscape}
\usepackage{ragged2e}
\usepackage{parskip}           % no indent for each paragraph but vertical space instead
%Change setting to line space between paragraphs
\setlength{\parindent}{10mm}                    % Paragraph indentation


\begin{document}

\RaggedRight
\parindent=0mm


In my Stata .do file, I macro out a text file containing the soon-to-be created $\LaTeX$ code via a -local- statement then with each call of the text file, I use a series -file open, write, and close- commands to open, write to, and close the text file.  The Stata results are grabbed and formatted for inclusion into a $\LaTeX$ file using various user-written commands, Ian Watson's -tabout- being the one I've primarily used for descriptive statistics. 

For example,

* **macro out text file to collect all LaTeX code and comments
local stats `"`"C:\Documents and Settings\stats.txt"'"'

 
file open stats using `stats', write replace text
file write stats "\section{Descriptive Statistics}" _n
file write stats "Statistics that follow are for the N=100 sample." _n(2)
file close stats

* **rank --- frequency distribution
file open stats using `stats', write append text
file write stats "Academic Rank"
file close stats

tabout rank using `stats', append oneway cells(freq col cum) format(0 1) ///
clab(No. Col_% Cum_%) style(tex) bt font(bold) topf(top.tex) botf(bot.tex) topstr(14cm) botstr(.)

* **index score, overall and by rank --- summary statistics
file open stats using `stats', write append text
file write stats "Index Score:  Overall and by Academic Rank"
file close stats

quietly oneway h_pre95 rank
local p = trim("`: display %9.4f (Ftail(`r(df_m)', `r(df_r)', `r(F)')) '")

tabout rank using `stats', append sum oneway ///
cells(N h_pre95 min h_pre95 max h_pre95 median h_pre95 mean h_pre95 sd h_pre95) ///
format(0 2) clab(.) style(tex) bt font(bold) topf(top.tex) botf(bot.tex) ///
topstr(14cm) botstr(One-way ANOVA, p = `p')


A couple of comments on the above snippet of code.  The first instance of -file open- contains replace as an option whereas the latter -file open- statements use append.  Second, the first -tabout- produces a one-way frequency distribution and the second -tabout- produces select summary statistics of a continuous variable stratified by a categorical variable.  Frustratingly, I haven't figured out how to generate a non-stratified table of summary statistics although I'm sure there is a simple and straightforward means of doing so.  For a more detailed and helpful explanations of the features and capabilities of -tabout-, see Ian Watson's help documentation. 

Given the number and variety of (exceptionally smart) Stata users out there, I suspect there are many methods of varying elegance that have been devised for exporting results to a non-Stata format.  This is one.  And one likely to evolve with my experience and needs. 

No comments:

Post a Comment