For your independent analysis project, you will answer a question using data and regression
tools learned this quarter. You will produce a very brief high-level policy memo detailing your
findings, and provide between 2-5 additional items (at least one figure and one table + 3
optional items) to present your findings in more details.
In all cases, you must do cross-sectional analysis(i.e. You should not try to do time series or
panel analysis, as we have not covered these techniques).
Part of your job will be to figure out how to best address your topic using a cross-sectional
framework. You are not required to do any outside research but you may.
Management & Trade: World Bank Business Enabling Environment
Imagine that you are a public servant in a fictional country that is democratic and has an
economy very similar to that of Mexico. In fact, you have just been appointed as the new Head
of the National Institute for Entrepreneurship, which was modeled off of INADEM - an Institute
created by the Mexican Government to promote entrepreneurship. Using data from the World
Bank Business Data and World Development Indicators, what do you think would be the best
strategy to foster Small and Medium enterprises in your country for entrepreneurship-driven
economic growth? You can focus this analysis in any way you like within the bounds of the
project guidelines and can make any necessary assumptions about your country(but be clear
2 Guidelines You are responsible for 2 deliverables (both of which will be submitted
electronically via Canvas):
• MEMO: A 2000-word maximum policy memo presenting your research and conclusions. You
will find it hard to summarize everything you have done in this amount of space. But you must
concisely describe how you used data to answer the question at hand, your main findings, and
the fundamental limitations to your analysis in clear language for a reader that may not
understand regression analysis. (Any references you include will not count against your page
total.) – SUPPORTING INFORMATION: Appended to the end of your memo you will include 2
to 5 additional items (Figures or Tables) that help present your research. You must have at least
one table that presents your main analysis results and one figure (your choice!); the rest are
optional and for you to determine. These figures and tables should include captions that let
them stand alone. These items should be clearly labeled and you should reference them from
• SCRIPT: A .R script that replicates all of the analysis for your memo and supporting
information. Please comment your script so that we can easily navigate your code (e.g., #
Generate Figure 1: Interaction Effects).
2 Your analysis should have the following general structure:
1. Motivation and Theoretical Underpinnings: You should begin by presenting the motivating
question, or set of questions, and a clear explanation of the theory guiding your analysis. WHY
are you doing what you are doing? Are there intellectual schools of thought that guide your
intuition? What hypothesis (or hypotheses) are you testing and what do you expect to find?
To test whether small and medium enterprises will spurn economic growth, I want to
gather data around different business sectors to see which sector drives the most
growth. My assumption is that in countries that are similar to Mexico there will be the
most economic growth with international trade and sectors involving agriculture. The
reasoning behind my intuition is that many countries want to import produce for a
cheaper price than the domestic country. My hypothesis is that there will be a strong
positive correlation between economic growth and agricultural sectors.
2. Data Selection: You must explain how you use the data to test your hypothesis and answer
the question at hand. What are the data you are using and why can they help you answer the
question of interest? What is (are) the dependent (outcome) variable(s)? What is (are) the
independent variable(s) of interest? This should include a discussion of case selection in light of
your theory: explain why you are using the subset of data you are using (both in terms of
observations and variables). You should also include a concise description of any data
manipulations/variables you have generated (how and why). Anyone who reads your paper and
looks at your do file should be able to easily replicate your analysis.
3. Methodology / Explanation of Model(s): You should present your model(s) with clear
justifications for your variable selection and the functional form of your variables, including any
interaction terms. What are you controlling for, and why?
4. Regression Analysis and Results: Your main analysis should be a series of regressions
testing the impact of your independent variable(s) of interest on your outcome variable. All
models should be reported in a clearly-labeled regression table on your supporting information.
Explain the progression of your analysis clearly (e.g., adding other variables; testing
interactions, etc.). Use graphics and simulations where appropriate. Discuss which model(s)
have the strongest statistical and practical significance. Interpret the meaning of your
coefficients in a useful manner and discuss the goodness of fit of your model.
5. Threats to Validity, Regression Diagnostics: Your analysis should include discussion of
potential violations of the Gauss-Markov Assumptions. If you exclude variables because of high
multicollinearity, please explain why, and present the appropriate diagnostics. You should
discuss potential problems with the Zero Conditional Mean and Homoskedasticity assumptions.
If such problems exist, discuss the implications for your analysis. Deal with these problems as
you are able; if you are unable to address them sufficiently, discuss the impact on your ability to
estimate regression parameters and conduct hypothesis testing.
6. Discussion and Conclusion: You should conclude with a thoughtful summary of your results,
and a clear set of policy-relevant conclusions. You should also discuss the limits of your
analysis, including problems with the data (e.g., selection bias and measurement error). How
would you improve this research design? What would be the next steps in your research?
A final note: You will do much more analysis than you can present in the memo and graphics. A
huge part of the work here will be in compressing what you have done into your findings. You
will need to spend time on the writing and presentation, so make sure to leave yourself time to
do that. We recommend spending a few days getting to know your data, reading, and planning
Then try to consolidate your analysis to a few days, and spend Week 10 writing, editing, and
putting together your final deliverables.
3 3 Grading Rubric The project is out of 60 points total: 25pts: Written Memo
• Have you clearly articulated the motivation for the analysis, your theory, and how you used the
data to test your hypotheses? What outcomes are you examining, and what is/are the main
independent variables of interest? Are there other observable implications of your
theory/hypothesis? How did you examine those? (6pts)
• What are alternative explanations for what you find and how did you account for them? What
kinds of controls did you use and why? How did you balance the various goals of model building
(thoroughness v. simplicity, etc.)? Have you interpreted your models clearly and correctly? Does
your analysis progression make sense? You should be telling a story here. (6pts) • Have you
addressed potential issues with data (outliers, measurement error) and violations of the
Gauss-Markov assumptions, and dealt with them as you are able? (5pts)
• Have you summarized your findings with appropriate policy-relevant conclusions and
discussed limitations to your analysis? (5pts)
• Presentation: Is your memo clearly-written, easy to follow, and compelling? Have you
eliminated spelling and grammatical errors? (3pts)
25pts: Supporting Information
• Motivating figure - you should have one figure that motivates your analysis clearly by providing
a visual display of the question. Remember that the caption of your figure should explain to the
reader what is meaningful. (5pts)
• It is said that a picture is worth a thousand words. Have you used additional graphics/figures to
tell your story (motivation, illuminating results, diagnosing problems, etc.)? Remember, you can
have up to 4 items including your regression table. (5pts)
• Have you included a clearly-labeled regression table with all of your results? Have you
summarized the table clearly with a few sentence-long caption? Remember that variable names
are often meaningless to a reader, and you can often summarize grouping variables and
controls for clarity. (10pts)
Remember that presentation matters for each of these items: Are your graphics and nice to
look at; is your table easy to read?
Would we put it on the wall here at GPS?
10pts: .R script
• Does your script run without error? (5pts)
• Does your script run and replicate everything in your report? Have you commented it so we
can navigate it easily? We will take a more in-depth look here. (5pts)