Stata Programming Resources

(Just FYI, this page is primarily meant for my graduate students, but the rest of internet is also free to personally judge me if it wants.)

Some basic thoughts on Stata

I do most of my statistical programming in Stata. The only exceptions are optimization and text analytic programs for which I use Python and a couple of econometric techniques for which the R packages are more convenient. Stata has so often made my life easier, that I have an affection for it in a way that is normally reserved one's dog. And I've got a pretty great dog.

There are a few places in which Stata is deficient, but I usually find it simpler to code an entire project in Stata rather than using a separate language for things like data assembly. Stata's data manipulation tools are often maligned, but they are actually quite good once you get used to the basic paradigm. I frankly find the complete data manipulation package in Stata more powerful and intuitive than SAS, with the sole exception being that calling SQL queries directly is more straightforward in SAS.

Pro-tip:it's not worth it. If you want SQL, setup an ODBC connector. If you want more sophisticated merge tools for .dta files, write your own wrapper or use a prewritten one like mmerge. Running optimization routines is also more cumbersome in Stata, often requiring a drop into Mata, than in a more general purpose language like Matlab or Python. Mata is becoming progressively more powerful with each release however, so it's worth looking into.

Stata Programming Resources

The Kit Baum book "An Introduction to Stata Programming” is very good. The library has a copy, but it’s also a reasonably priced textbook and it’s worth having as a resource.

Princeton, and UCLA have excellent Stata resources:

The Princeton resources are better for explaining data manipulation and programming. The UCLA resources are better for understanding basic statistical operations. 

Wisconsin has a primer that’s not bad:

The Stata Corporation itself has a lot of good resources. They offer online “NetCourses” on various topics in Stata. They aren't super cheap, but having gone through one I do think they are worth the money. It did an especially good job of going over "best practices” that often get left out of more straight forward technical resources. Stuff like using -assert- statements to validate your data, which has saved my butt countless times since then and something I wouldn’t have ever normally known what to do with.

Also, the Stata manuals themselves are often overlooked. They’re free in PDF form. They’re obviously useful as a reference, but the more general sections are written to be a instruction tool as well. They are very well documented and provide both excellent high level overviews of the techniques used, replete with formally written estimating equations, and excellent source documentation for further study.

Outputting Tables From Stata

Particular attention should be given to creating tables from Stata output. There is only one answer here: Use the estout command suite.

I cannot stress this enough. The estout package is fantastically powerful, and the -esttab- wrapper command produces bulletproof, production quality tables for 95% of cases. All of my programs spit out very nice looking, fully formatted tables directly into my paper. I never have to touch them by hand. If I change something in my estimation program, I just rerun it and the table output is rewritten automatically. It's that powerful. If you are copying text from the command window into your tables using Copy-Paste, you are doing it wrong. For your own sanity, stop it.

There is also -outreg- and it's followup -outreg2-. Development on -outreg- has apparently restarted, and John Gallup is updating it again, but it still feels a bit too basic. While -outreg2- is more fully featured, I wouldn't use it, at least not for final production. While Roy Wada has and continues to do a fantastic job of updating -outreg2-, I still feel it sacrifices too much flexibility in the name of simplicity and compactness.  While this simplicity and compactness is useful when doing early exploratory work interactively, the power and flexibility of the estout suite makes it my primary choice. This is especially true for people who are willing to dirty their hands with a little programming, since estout handles passthrough options so well. It is also exceedingly well documented.

As a final note, I am obviously an estout "fanboy" so take my opinions for what they are. But, then it's the internet, so you kind of knew that already.