This dissertation is composed with 4 essays. They explore modelling uncertainty following two major directions. The former 2 contains topics on ordinary and general ridge-type shrinkage estimation developed from model averaging and kernel density estimation. The third one critically reviews recent literature in the areas of model averaging and model selection both parametrically and nonparametrically and proposes topics for future work. The last one focuses on nonparametric panel data estimation with random effects. In chapter 2, ordinary ridge-type shrinkage estimation is extensively studied, where a class of well-behaved ordinary ridge-type semiparametric estimators is proposed. Monte Carlo simulations, theoretical derivations, as well as empirical out-of-sample forecasts are all investigated to prove their usefulness in reducing mean squared errors, i.e. risks. Chapter 3 develops the works in Chapter 2 to the general ridge regressions. By connecting general ridge regression with kernel density estimation, an asymptotically optimal semiparametric ridge-type estimator is built. By connecting general ridge regression with model averaging, a class of model averaging ridge-type estimators are obtained. These estimators are observed to have different improvements upon the feasible general ridge estimators when model uncertainties, i.e., the error variances are different. To encourage better understanding on model averaging and model selection, Chapter 4 gives a comprehensive literature review and analysis on these topics from a frequentist's point of view. Parametric and nonparametric procedures in the recent developments are explored. Chapter 5 starts investigating panel data estimation by introducing nonparametrics in the picture. The proposed two-stage estimator shows good behaviors in Monte Carlo simulation. In addition, illustrative empirical examples in health economics and environmental economics are also introduced.

## Type of Work

Article (18) Book (0) Theses (20) Multimedia (0)

## Peer Review

Peer-reviewed only (38)

## Supplemental Material

Video (0) Audio (0) Images (0) Zip (0) Other files (0)

## Publication Year

## Campus

UC Berkeley (0) UC Davis (3) UC Irvine (4) UCLA (0) UC Merced (0) UC Riverside (21) UC San Diego (5) UCSF (2) UC Santa Barbara (0) UC Santa Cruz (0) UC Office of the President (0) Lawrence Berkeley National Laboratory (3) UC Agriculture & Natural Resources (0)

## Department

School of Medicine (5)

## Journal

Dermatology Online Journal (1)

## Discipline

Life Sciences (1) Medicine and Health Sciences (1)

## Reuse License

BY - Attribution required (2) BY-NC-SA - Attribution; NonCommercial use; Derivatives use same license (1)

## Scholarly Works (38 results)

My dissertation consists of six essays which contribute new theoretical results

to two econometrics frontiers: nonparametrics and finite sample econometrics. Chapters 2 to 3 discuss the estimation and inference of the nonparametric and semiparametric models. In chapter 2 an efficient two-step estimator is developed in single nonparametric regression model with a general parametric error covariance. By fully utilizing the information incorporated in the error covariance into estimation, the newly developed method is more efficient compared to the conventional local linear estimator (LLLS) and some other two-step estimator. The corresponding asymptotic theorems are derived. Monte Carlo study shows the relative efficiency gain of the newly proposed estimator. Chapter 3 systematically develops a new set of results for seemingly unrelated regression (SUR) analysis within nonparametric and semiparametric framework. We study the properties of LLLS and local linear weighted least squares (LLWLS) estimators, provide an efficient two-step estimation for the system and establish the asymptotic theorems under both unconditional and conditional error variance-covariance cases. The procedures of estimation for various nonparametric and semiparametric SUR models are proposed. In addition, two nonparametric goodness-of-fit measures for the system are given. Chapter 4 applies the estimation method developed in chapter 2 and 3 to an empirical analysis on return to public capital in U.S.

Chapters 5 to 6 study the finite sample properties of the mean reversion parameter estimator in continuous time models. In chapter 5 we approximate the bias of the estimator for the Levy-based Ornstein-Uhlenbeck (OU) process, and propose bias corrected estimators. In chapter 6 the exact distribution of the MLE is investigated under different scenarios: known or unknown drift term, fixed or random start-up value, and zero or positive . The numerical calculations demonstrate the remarkably reliable performance of the proposed exact approach.

In chapter 7 we study the efficiency of the coefficient of determination based on

final prediction error and compare it with conventional goodness-of-fit measures

in linear regression models with both normal and non-normal disturbances. The

efficiency results show that R2 based on

final prediction error has practical use in empirical analysis, for examples,

panel data analysis and time series analysis.

Nonparametric approaches have widely been used due to their advancement in not making assumptions on the distribution of the data. Even with their extensive development, nonparametric hypothesis testing has not been developed as much as a nonparametric estimation even though it is one of the key components of the econometric analysis. This dissertation has mainly two parts. I first explore the systematic development of the current nonparametric tests and provide results on testing linearity as an illustration. Then I develop new nonparametric tests for detecting endogeneity in cross-sectional data and panel data respectively.

Elaborating each test's performance can be meaningful in that we can decide which test to use depending on the hypothesis and even construct a new test based on such a relationship. Under the hypotheses for linearity, Chapter 2 will categorize the types of nonparametric tests and discuss the analytical relationship of those tests. By imposing some conditions, I can compare the local power of each test asymptotically. While examining the analytical relationship, I develop a nonparametric Rao-Score test and show it to be equivalent to the Su and Ullah (2013) test.

Once analyzing the analytical relationship of the current nonparametric tests, I focus on developing a new nonparametric test for endogeneity. Since endogeneity is commonly observed in many economic contexts, detecting its presence is a preliminary step for choosing an estimation strategy. In Chapter 3, I construct a test using the control function approach under a triangular simultaneous equations model. My test can be summarized as being simple to implement as a test and being able to capture the locally nonlinear correlation with kernel weighting. Furthermore, I will apply these tests to the empirical analyses and show the contradicting results with the parametric test.

Not only in triangular simulation equations model, but also is endogeneity important model specification issue in panel data setting. The estimation strategy differs depending on the presence of endogeneity between the individual specific effects and the variable. I propose a new estimation method for the nonparametric panel random effects model and construct a new test for endogeneity using the residuals from the proposed estimation method. By obtaining the individual specific effects in the random effects model, I construct a test over the i index instead of the i index and time. With a large T, the test performs well in terms of size and power.

This dissertation covers several topics in estimation and forecasting in panel data models.

Chapter one considers the panel data model with correlated individual eﬀects and regressors. We form a combined estimator from combining the ﬁxed eﬀects (FE) and random eﬀects (RE) estimators. We derive the asymptotic distribution and the asymptotic risk of our estimator using a local asymptotic framework. We show that if the regressor dimension exceeds two, the asymptotic risk of the combined estimator is strictly less than that of FE estimator. Our simulation result shows that the

combined estimator can reduce ﬁnite sample MSE relative to the FE estimator for all degrees of endogeneity and heterogeneity, as well as relative to the RE estimator for moderate to large degrees of endogeneity and heterogeneity. We also apply the combined estimator to revisit the relationship between public capital infrastructure and private economic performance.

Chapter two extends chapter one into the semi-parametric (SP) framework, and proposes a combined SP-FE and SP-RE estimator.Chapter three considers the panel data model with correlated residuals and regressors. In the presence of such correlation, both FE and RE estimators yield biased and inconsistent estimates of the parameter. We propose a combined FE and FE-2SLS estimator, and a combined RE and RE-2SLS estimator.

Chapter four considers regression models for panel data that exhibit cross-section dependence due to common shocks. Model with factor structures for errors and regressors are considered. In this case, the FE estimator is inconsistent. To solve this problem, Pesaran (2006) introduced the common correlated eﬀects pooled (CCEP) estimator. We propose a combined FE and CCEP estimator, and show that under certain conditions, the combined estimator has strictly smaller risk than the CCEP estimator. Finally, we use Holly et al. (2010) state-level housing data to show the applicability of the combined estimator.

Chapter ﬁve proposes a combined approach to econometric forecasting. Monte-Carlo simulations are conducted to evaluate the performance of the combined forecast in ﬁnite samples. We contrast the out-of-sample forecast performance of the FE, RE and the combined approaches using the electricity and natural gas data sets.

The theme of this dissertation is the risk and return modeling of financial time series. The dissertation is broadly divided into three chapters; the first chapter focuses on measuring risks and uncertainty in the U.S. stock market; the second on measuring risks of individual financial assets; and the last chapter on predicting stock return. The first chapter studies the movement of the S&P 500 index driven by uncertainty and fear that cannot be explained by economic fundamentals. A new measure of uncertainty is introduced, using the tone of news media coverage on the equity market and the economy; aggregate holding of safe financial assets; and volatility in S&P 500 options trading. Major contributions of this chapter include uncovering a significant non-linear relationship between uncertainty and changes in the business cycle. An increase in uncertainty is found to be associated with drastic but short-lived falls in stock prices; while economic fundamentals have a small but prolonged effect on the stock market prices. The second chapter proposes a new Value at Risk (VaR) and Expected Shortfall (ES) estimation procedure that involves estimating the variance of return using conditional semiparametric approach introduced by Mishra, Su and Ullah (2010). Thus, estimation of variance is independent from the assumed distribution. Monte Carlo simulations are used to compare the performance of these new estimates using normal, Student-t, laplace, ARCH, GARCH, and GJR GARCH distributions. VaR and ES for Amazon, SP500, Microsoft, Nasdaq, USD/GBP and USD/Yen are estimated and the performance of each estimation method is further tested using a battery of tests. The third chapter explores whether non-parametric and semi parametric methods can reduce the bias in predictive regressions in the presence of high persistence in the predictive variables and non-linear relationship with the dependent variable. The predictive performance of the independent variables suggested in the literature to predict stock returns are re-evaluated in sample and out of sample using two step non-parametric and semi parametric models. Empirical RMSE are used to compare the proposed models with the historical average, OLS and non-parametric regression models.

This dissertation is composed of three research topics. The first topic proposes an intrabasin allocate-and-trade, institution, to manage the eastern Nile River basin with the objective of increasing the overall basin's welfare through improving efficiency, equity and sustainability. By developing the Nile Environmental and Economic Optimization Model (NEEOM), we estimate the current, planned and improved welfare value. We find that a water trade institution can achieve nearly 100% of the welfare created by economically efficient allocation, and secure equivalent volumes of water compared with the status quo scheme. We estimate that riparian countries could raise about $660 million per annum for protecting and conserving the natural resources of the basin. Finally, using Global Circulation Models, we find that the institution will recover nearly all of the efficient outcomes.

The second topic is designed to study the behavior of carbon price volatility before, within and after the 2008/09 global recession using Markov Regime Switching model. The results show that an unregulated voluntary carbon market was in high-volatile regime within, and two years before, the recession. A regulated compliance carbon market was relatively in stable and low-volatile regime for these periods, except at the end of the recession. It can be inferred that high-volatile regimes were, however, not caused by the recession per se. The Wald tests show that there were distinct low-and high-volatile regimes during the recession period, indicating that the recession aggravated the volatility of both voluntary and compliance markets.

The third topic is designed to study the relationship between economic growth and pollution using nonparametric econometric technique. The results indicate a partial relationship between GDP per capita and the level of PM10 pollution for low- and high-income countries. Hence, environmental policies for reducing the level of PM10 pollution have to emphasize middle-income and oil-producing high-income countries that show unprecedented increase in the level of PM10 pollution. Further, the Li and Wang test indicates that nonparametric analysis turns out to produce better results than quadratic and cubic specifications. Semiparametric models show decreasing pollution level as income rises and improve the smoothness of the relationship.

My dissertation is comprised of three independent empirical chapters. Below is a brief description of each.

The first Chapter is titled Infant Mortality Rates in India: District-Level Variations and Correlations". This paper examines the correlates of infant mortality in India using district-level data from the 1991 and 2001 Census of India. While infant mortality rates have dropped across districts over this ten year period, there still remains a lot of heterogeneity across districts and hence across the states. Using a panel data set of 666 districts, the analysis seeks to determine which of socio and or economic factors play an important role in reducing infant mortality rates. In our empirical work, the explanatory variables used are male and female literacy, male and female labor force participation, the level of poverty, urbanization and other socio-economic variables. We use quantile regression analysis to determine which of these factors impact infant mortality. Quantile regression is preferred over OLS because it allows us to estimate models for the conditional median function, and the full range of other conditional quantile functions and therefore provides a more complete statistical analysis of the stochastic relationship among random variables. The analysis brings out the powerful in influence of woman's characteristics on infant mortality, especially literacy and labor force participation. Increases in both of these variables significantly reduce child mortality at the district level. Improvements in male laborers in non-agricultural work and reductions in poverty also reduce child mortality, but their quantitative impact is weak in comparison. Further the non-parametric analysis reinforces the results found in the parametric section. They indicate that the action or the impact of the covariates is strongest in the districts which lie in the center of the conditional distribution, rather than those at the extreme. This analysis allows us to determine in which districts the impact of additional target policies would yield the greatest reduction in infant mortality.

The second paper is titled Same-sex siblings and their affect on mothers labor supply in South Africa". This study aims to look at the labor-supply consequences of childbearing for women in South Africa. However due to the endogeneity of fertility, the research question becomes complicated. Using parental preferences for a mixed sibling-sex composition I construct instrumental variables (IV) estimates of the eect of childbearing on labor supply of all women aged 15-35 years having more than two children. The data used for the study is the 10% household sample from the 2001 census. The covariate of interest in the labor supply model is the indicator More than two children. Demographic variables include mother's age, age of the mother at first birth, years of schooling, indicators for race, and an urban dummy. Labor-supply variables include hours worked per week, worked for pay and total income. Unlike previous studies which restrict their sample to include only female household heads or spouse of male household heads, this study expands the sample size to include all childbearing women in the household aged 15-35. The IV estimates that exploit the fertility consequences of sibling sex do not conrm the OLS estimates showing that more children lead to lowering of female labor supply. While OLS estimates exaggerate the causal effect of children, children seem to have a smaller effect on the labor supply of college-educated women. I find that labor market outcomes of childbearing are more severe for married women in South Africa.

The third paper is on The effect of Immigration on Ethnic Composition and Occupational Reallocation". Over the last 30 years, the U.S. labor market has been transformed by the 'second great migration'. Much of this immigration has been among the lower skilled; the share of High School Dropout (HSD) workers who are foreign born increased from 12% in 1980 to 44% in 2007. At the same time, native born HSD workers grew more slowly than any other educational category, falling by nearly 6%. These two outcomes have inevitably lead to much speculation that immigrants depress the wages of similarly skilled natives. The labor economics literature, however, has found little empirical evidence to support this claim. We aim to assess whether the impact of immigration is mitigated by occupational transition of natives. Being over represented among HSDs, we focus on the labor market outcomes for Black workers. We use data from the 5% public use sample of the census (1980, 1990 and 2000) as well as the 1% sample of the population from the American Community Survey (2005, 2006 and 2007) to estimate the effect of occupational reallocation on the wages of Black workers as well as the eect of immigration on reallocation. A shift-share analysis reveals that occupational transitions caused wages for Blacks to rise by 46% more than they would have with a static occupational distribution. However, we find that these occupational shifts were due to crowding out effect of Hispanics on Black occupations: a 10 percentage point increase in the share of workers in an occupation who are Hispanics leads to a 5 percentage point decrease in the share of Black workers in that occupation. This is significantly large to explain substantially, occupations that declined in importance for Blacks during the period of study. We nd a strong correlation between importance of occupations to Hispanic and Blacks, suggesting that most occupational transition for these two groups has not only been driven by outside factors such as trade and technological change, but that these shocks are affecting the two groups similarly.

Chapter 1, 3 and 4 focus on the analysis of interval-valued data (joint with Professor González-Rivera). In Chapter 1, we propose a constrained regression model that preserves the natural order of the interval in all instances. Within the framework of interval time series, we specify a general dynamic bivariate system for the upper and lower bounds of the intervals, and propose a (modified) two-step estimator. Monte Carlo simulations show good finite sample properties of the proposed estimators. We model the daily interval of low/high SP500 returns before and after 2007, and find that truncation is very severe during and after the financial crisis of 2008, so that a modified two-step procedure should be implemented. In Chapters 3 and 4, we adopt an alternative modelling approach for interval-valued data that exploits the extreme property of lower/upper bounds of interval, which is ignored in the existing literature. Specifically, Chapter 3 and 4 propose two different models and estimation strategies (ML and semiparametric estimation) that combines the knowledge of order statistics and extreme value theory with interval-valued data respectively.

As a separate strand of research, in Chapter 2 (joint with Professor Ullah), we propose an adaptive spline estimator based on Friedman (1991)'s multivariate adaptive regression splines. The model takes the form of an expansion in the cross product spline bases, where the numbers of spline functions, the degree of tensor product and knot locations are automatically selected adaptively by using generalized cross validation. Our estimator is more tractable not only in computational implementations but in theoretical deductions as well. We establish the asymptotic normality of our adaptive estimator, and obtain the optimal convergence rate that it can possibly achieve. The optimal convergence rate depends on the order ratio of the number of selected spline basis functions to the total potential ones. The Monte Carlo simulation, comparing the adaptive estimator with classical regression splines given various DGP settings, shows that our estimator has more significant improvement upon classical regression splines by producing smaller AMSE given the DGP with multivariate covariates. We also apply our adaptive estimator to the study of the effect of public capital stock on the gross state product using the pooled panel data set in Baltagi and Pinnoi (1995).

In the first chapter of this thesis, we propose a penalized splines (P-splines) estimator for random effects panel data model. While being a nonparametric technique, one of the most attractive properties of splines methods is their analogous setup to parametric regression model. Compared to kernel-based methods, however, splines methods are far less developed at least for econometric models. The asymptotic results of our P-splines estimator are established in this chapter. Monte Carlo simulation is conducted to compare the performance of our P-splines estimator with different kernel-based estimators proposed in recent literature. It turns out that the P-splines estimator consistently performs well and is computationally efficient.

In the second chapter, we develop a general procedure to derive the asymptotic variance-covariance matrices of two-stage estimators that can be used to estimate simultaneous equation systems with a mixture of any number of binary and continuous dependent variables. To demonstrate the usefulness of our procedure, we apply our formulas to empirical data with one continuous and two binary dependent variables in the simultaneous equations system. Our results are expected to be of tremendous help to numerous practitioners of econometrics using two-stage procedures to estimate their simultaneous equations models.

This dissertation covers several topics in the second-order bias and mean squared error (MSE) of quantile and expectile estimators.

Chapter one presents the introduction of this dissertation. Thefinite sample theory using higher order asymptotics provides better approximations of the bias and MSE for a class of estimators. Rilstone, Srivastava and Ullah (1996) provided the second-order bias results of conditional mean regression. The goal of this dissertation is to develops analytical results on the second-order bias and MSE for quantile and expectile estimators.

Chapter two develops new analytical results on the second-order bias up to order O(N^-1) and MSE up to order O(N^-2) of the conditional quantile regression estimators. First, we provide the general results on the second-order bias and MSE of conditional quantile estimators. The second-order bias result enables an improved bias correction and thus to obtain improved quantile estimation. In particular, we show that the second-order bias are much larger towards the tails of the conditional density than near the median, and therefore the benet of the second order bias correction is greater when we are interested in the deeper tail quantiles, e.g., for the study of income distribution and financial risk management. The higher order MSE result for the quantile estimation also enables us to better understand the sources of estimation uncertainty. Next, we consider three special cases of the general results, for the unconditional quantile estimation, for the conditional quantile regression with a binary covariate, and for the instrumental variable quantile regression (IVQR). For each of these special cases, we provide the second-order bias and MSE to illustrate their behavior which depends on certain parameters and distributional characteristics. The Monte Carlo simulation indicates that the bias is larger at the extreme low and high tail quantiles, and the second-order bias corrected estimator has better behavior than the uncorrected ones in both conditional and unconditional quantile regression. The second-order bias corrected estimators are numerically much closer to the true estimators of data generating processes. As the higher order bias and MSE decrease as the sample size increases or as the regression error variance decreases, the benefits of the finite sample theory are more apparent when there are larger sampling errors in estimation.

Chapter three develops the second-order asymptotic properties (bias and mean squared error) of the asymmetric least squares (ALS) or expectile estimator, extending the second-order asymptotic results for the symmetric least squares (LS) estimators of Rilstone, Srivastava and Ullah (1996). The LS gives the mean regression function while the ALS gives the "expectile" regression function, a generalization of the usual regression function. The second-order bias result enables an improved bias correction and thus to obtain improved ALS estimation. In particular, we show that the second-order bias is much larger as the asymmetry is stronger, and therefore the benet of the second-order bias correction is greater when we are interested in extreme expectiles which are used as a risk measure in financial economics. The higher order MSE result for the ALS estimation also enables us to better understand the sources of estimation uncertainty. The Monte Carlo simulation confirms the benefits of the second-order asymptotic theory and indicates that the second-order bias is larger at the extreme low and high expectiles, and the second-order bias correction improves the ALS estimator in bias.

Chapter four introduces the predictive quantile regression and predictive expectile regression. Predictive regression is a fundamental econometric model and widely discussed in finance literature. This chapter focuses on the second-order bias reduction for both regression models, which enable us to obtain a better predictive estimates. An empirical application to stock return prediction using the dividend yield illustrates the benet of the proposed second-order bias reduction method. We show that the bias is larger at the tails of the stock return distribution.

Chapter five contains the conclusion.