As far as I know, Stata applies a "few clusters" correction in order to reduce bias of the cluster-robust variance matrix estimator by default. Data stemming from cluster sampling procedures should contain a variable that denotes to which cluster each case belongs (often this cluster is called "primary sampling unit"). What are the possible problems, regarding the estimation of your standard errors, when you cluster the standard errors at the ID level? Other users have suggested using the user-written program stcrprep, which also enjoys additional features. Thompson, S. B. The standard Stata command stcrreg can handle this structure by modelling standard errors that are clustered at the subject-level. Journal of Business & Economic Statistics. The dataset we will use to illustrate the various procedures is imm23.dta that was used in the Kreft and de Leeuw Introduction to multilevel modeling. This dataset has 519 students clustered in … Stata can automatically include a set of dummy variable f When you have panel data, with an ID for each unit repeating over time, and you run a pooled OLS in Stata, such as: reg y x1 x2 z1 z2 i.id, cluster(id) And how does one test the necessity of clustered errors? I know there's a pakcage in "R" that does it but R is not exactly my most favored program. That is, you are not guaranteed to be on the safe side if the different standard errors are numerically similar. where data are organized by unit ID and time period) but can come up in other data with panel structure as well (e.g. firms by industry and region). However, my dataset is huge (over 3 million observations) and the computation time is enormous. More examples of analyzing clustered data can be found on our webpage Stata Library: Analyzing Correlated Data. All you need to is add the option robust to you regression command. 