Improved Dynamical Constraints on the Mass of the central Black Hole in NGC 404

We explore the nucleus of the nearby 10$^9M_{\odot}$~early-type galaxy (ETGs), NGC~404, using Hubble Space Telescope (HST)/STIS spectroscopy and WFC3 imaging. We first present evidence for nuclear variability in UV, optical, and infrared filters over a time period of 15~years. This variability adds to the already substantial evidence for an accreting black hole at the center of NGC~404. We then redetermine the dynamical black hole mass in NGC~404 including modeling of the nuclear stellar populations. We combine HST/STIS spectroscopy with WFC3 images to create a local color-$M/L$~relation derived from stellar population modeling of the STIS data. We then use this to create a mass model for the nuclear region. We use Jeans modeling to fit this mass model to adaptive optics (AO) stellar kinematic observations from Gemini/NIFS. From our stellar dynamical modeling, we find a 3$\sigma$ upper limit on the black hole mass of $1.5\times10^5M_{\odot}$. Given the accretion evidence for a black hole, this upper limit makes NGC~404 the lowest mass central black hole with dynamical mass constraints. We find that the kinematics of H$_2$ emission line gas show evidence for non-gravitational motions preventing the use of gas dynamical modeling to constrain the black hole mass. Our stellar population modeling also reveals that the central, counter-rotating region of the nuclear cluster is dominated by $\sim$1~Gyr old populations.


Introduction
Central massive black holes (MBHs) with masses of ∼10 6-9  M are ubiquitous components of massive galaxies (>10 10  M , Kormendy & Ho 2013). However, the demographics of black holes (BHs) in the centers of lower mass galaxies are less well constrained. Any MBHs that exist in these low-mass (<10 10  M ) galaxies are likely to have lower BH masses, 10 6  M (e.g., van der Marel 2004; Greene & Ho 2007;McKernan et al. 2011;Dong et al. 2012;Neumayer & Walcher 2012;Reines et al. 2013;Yuan et al. 2014). The presence of MBHs in lower mass galaxies (known as the occupation fraction) and the mass of these MBHs are sensitive to the mechanism that forms seed MBHs in the early universe (van Wassenhove et al. 2010;Volonteri 2010). In fact, measurements of the mass and occupation fraction of MBHs in low mass galaxies are the only tools we currently have for constraining the formation of MBHs.
Unfortunately, the occupation fraction and masses of MBHs in low mass galaxies are difficult to measure because most galaxies are too far away to measure the dynamical effect of an 10 6  M MBH with current instrumentation. Thus, our knowledge of MBHs in low mass galaxies is limited to galaxies where active galactic nuclei (AGNs) are observed (e.g., Filippenko & Sargent 1989;Barth et al. 2004;Greene & Ho 2004;Satyapal et al. 2007;Reines et al. 2013;Moran et al. 2014;Baldassare et al. 2015), or where tidal disruption events occur (e.g., Maksym et al. 2014). These objects provide strong evidence that some MBHs do exist in galaxies with stellar mass as low as a ∼3×10 8  M . However, evidence for an AGN is detected in at most a few percent of galaxies (Moran et al. 2014), while rough MBH mass estimates based on broad line AGNs are available for only a small number of objects (Greene & Ho 2007;Reines et al. 2013;Baldassare et al. 2015;Reines & Volonteri 2015). Dynamical estimates are only possible in the very nearest galaxies, and only a handful of dynamical MBH mass estimates or significant non-detections are available (Gebhardt et al. 2001;Verolme et al. 2002;Valluri et al. 2005;Seth et al. 2010;den Brok et al. 2015). This lack of good MBH mass estimates also means that scaling relations between galaxy properties (e.g., bulge luminosity and dispersion) lack data at the low-mass end (see recent compilations Kormendy & Ho 2013;Graham & Scott 2015;Saglia et al. 2016). Additional dynamical BH mass measurements in low-mass galaxies are therefore important for understanding the physics that underlies known scaling relations.
Most low-mass galaxies also host massive nuclear star clusters (NSCs, e.g., Böker et al. 2002;Côté et al. 2006;den Brok et al. 2014;Georgiev & Böker 2014). There are a number of galaxies which contain both an MBH and an NSC (e.g., Filippenko & Ho 2003;Seth et al. 2008a). However, the relationship between MBHs and NSCs is not yet well understood. Initial discoveries of scaling relationships between NSCs and early-type galaxies (ETGs) suggested that low-mass galaxies may form NSCs rather than MBHs Wehner & Harris 2006), more recent studies suggest a transition with galaxies increasingly favoring an MBH in galaxies of higher mass (Graham & Spitler 2009;Neumayer & Walcher 2012;Kormendy & Ho 2013;Georgiev et al. 2016). This transition may occur due to tidal disruption effects (Antonini 2013;Antonini et al. 2015aAntonini et al. , 2015b or feedback (McLaughlin & van der Marel 2005;Nayakshin et al. 2009).
The nearby galaxy NGC404 provides an excellent opportunity to study the relationships of NSCs and MBHs. NGC404 is a dwarf S0 galaxy ( M 10 9  M , Seth et al. 2010) at a distance of just 3.06±0.37 Mpc (Karachentsev et al. 2002). The galaxy appears to host an MBH within its prominent central NSC (Seth et al. 2010;Binder et al. 2011;Nyland et al. 2012;Paragi et al. 2014).
The NSC of NGC404 was studied in depth by Seth et al. (2010, hereafter S10). They found a dynamical mass of the NSC of (1.1 ± 0.2)×10 7  M , and stellar population analysis suggests that half this mass formed ∼1 Gyr ago. Gas and star formation in the outer parts of the galaxy suggest that NGC404 acquired gas ∼1 Gyr ago during a minor gas-rich merger (del Río et al. 2004;Bouchard et al. 2010;Thilker et al. 2010); S10 proposed the 1 Gyr stars in the NSC were formed due to this merger. However, consistent with its early-type morphological classification, the galaxy is very old, with 90% of the stellar mass being formed more than 8 Gyr ago (Williams et al. 2010).
Recent observations have shown that the nucleus of NGC404 appears to host an accreting BH that powers a low-luminosity AGN. The first indication was from the optical spectrum, which was found to have a LINER classification (Stauffer 1982;Keel 1983;Ho et al. 1997); there is considerable debate on the nature of LINER nuclei (e.g., Singh et al. 2013), but they have frequently been found to be associated with AGNs (e.g., Nagar et al. 2005). The nucleus has also been found to have a hard and compact X-ray core with =-+ L 1.3 10 X 0.5 0.8 37 erg s −1 (Binder et al. 2011), and a compact radio core (Nyland et al. 2012) at arc-second scales. Maoz et al. (2005) also found that the UV emission is variable, declining by a factor of ∼3 between 1993 and 2002m while S10 also found some evidence for variability in the nearinfrared (NIR) due to hot dust around the AGN. The mid-IR spectrum of NGC404 shows evidence for high excitation consistent with other AGNs (Satyapal et al. 2004). In particular, the ratio of the [O IV] flux relative to other emission lines ([Ne II], [Si II]) is higher than any other LINERs in the sample of Satyapal et al. (2004) and is similar to other known AGNs. However, [Ne V] lines, which are the more reliable indicator of AGN activity (Abel & Satyapal 2008), are not detected. Among these, we find the presence of a hard X-ray core and variable UV emission to be the strongest pieces of evidence for an AGN in NGC404.
Despite this evidence, the proof of an AGN in NGC404 is complicated by recent star formation in the nucleus. Hubble Space Telescope (HST) observations of a H shows a compact emission source at  0. 16 north of the nucleus that may be due to supernova remnants (Pogge et al. 2000). The nuclear UV spectrum also clearly shows evidence for massive stars although these observations leave room for a UV continuum component from an AGN component (Maoz et al. 1998). Paragi et al. (2014) did not detect the radio source on 10mas scales, suggesting either a non-active or low mass BH, and suggesting that at least some of the radio emission is due to star formation. An upcoming paper by K. Nyland et al. (2017, in preparation) bridges the scale of emission sensitivity, and does indeed find some compact radio emission coincident with the nucleus.
Dynamical measurements using high-resolution adaptive optics (AO) data from Gemini/NIFS by S10 indicate the possible detection of an MBH, with a firm upper limit of <10 6  M . More specifically, S10 used two kinematic tracers to constrain the BH mass, stellar kinematics from the CO bandhead at 2.3 μm, and gas kinematics from the H 2 line at 2.12 μm. Their stellar dynamical modeling gave a best fit M BH =5×10 4  M , but consistent with 0  M at 3σ. The kinematic dynamical model of molecular hydrogen emission from the nucleus revealed the best-fit mass for the BH in a large range of M BH =-+ 4.5 10 2.0 3.5 5  M (3σ error bars). Although there is a large discrepancy between the BH mass estimate from the stellar and gas modeling, S10 put a firm upper limit of M BH <10 6  M . The dynamical modeling in S10 depended on a number assumptions, most importantly the assumption of a constant M/L throughout the nucleus. This is a poor assumption due to the obvious spatial gradients of stellar populations within the nucleus. The stellar mass in the central resolution element of the Gemini/NIFS data was comparable to the BH mass (i.e., the sphere of influence was not well resolved), hence, any spatial gradients in the M/Lcan have a significant effect on the BH mass estimate. In this paper, we use new STIS spectroscopy and WFC3 imaging to quantify the spatial variations in the M/Lthroughout the NGC404 nucleus and improve the BH mass estimate using the same Gemini/NIFS data presented in S10. Incorporating M/Lgradients to refine BH mass estimates was recently explored by , who used color gradients to explore possible radial M/L gradients in three giant elliptical galaxies. This paper is organized into seven sections. In Section 2 we describe the observations and discuss their initial reduction and analysis. The nuclear variability, stellar population modeling of the STIS data, and creation of color-M/Land mass maps are presented in Section 3. Jeans modeling of the stellar kinematics incorporating the mass map is presented in Section 4, and the gas dynamical modeling is in Section 5. We discuss and conclude in Sections 6 and 7. We assume a distance of 3.06±0.37 Mpc (Karachentsev et al. 2002), giving a physical scale of 15 pc arcsec −1 . Unless otherwise indicated, all quantities quoted in this paper have been corrected for a foreground extinction A V =0.306 (Schlafly & Finkbeiner 2011) using the interstellar extinction law of Cardelli et al. (1989).

HST/STIS Spectroscopy
The STIS spectroscopic observations (PID: 12611, PI: Seth) were taken on 2012 November 18 with the G430L grating and the  52. 0×  0. 1 slit. This provides spectra over a wavelength range of 2900 Åto 5700 Åwith a pixel size of 2.73 Å, and spectral resolving power of R ∼ 530-1040. Seven individual exposures were taken, each with an exposure time of ∼1890s, for a total exposure time of 13230s. The source was dithered along the slit and centered near the E1 aperture position to minimize charge-transfer inefficiency losses.
Reduced and rectified spectroscopic exposures (x2d files) were downloaded from the Hubble Legacy Archive (HLA). The STIS images suffer from numerous imaging defects, especially vertical defects extending over many rows. To remove these features from our dithered data, we created a median image without dithering which was subtracted from each individual, dithered image. Each dither was separated by nine pixels, thus, some background galaxy light will be included in this median image. However, this effect is very small; the maximum row-averaged flux of the median image is <25% of that at the outermost radius we analyze.
We then combined the seven individual exposures into a single two-dimensional (2D) spectrum. We applied integer offsets of nine pixels between each exposure after verifying these offsets using the position of the nucleus. Before combining exposures, we used sigma-clipping at the 3σ level to remove cosmic rays and bad pixels, and also flag bad pixels based on their data quality (DQ) values. Specifically, we included pixels with DQ<1000. We then combined the exposures by taking a mean value of all remaining unflagged and unclipped pixels from the seven exposure stack. We also created an error frame for the combined image by creating variance frames from the original error frames. Then for each pixel, we added the appropriate variance frames together and divided by the square of the total number of exposures.
The signal-to-noise (S/N) of the central pixel in the combined spectrum is 37 and 50 per pixel at 3700 and 5000 Å, respectively. The S/N drops off to 20 per pixel at 5000 Å for binned data at +(0 475-0 575) and −(0 675-0 825), which are the outermost data we use in our spectral analysis.

HST/WFC3 and WFPC2 Data
The HST imaging data we use in this work are taken from different cameras. They include WFC3 and WFPC2, ACS/WFC, and ACS/HRC imaging; the data we use in this work are summarized in Table 1. Reduced drizzled images were downloaded from the HST/HLA. The HST/WFC3 images were obtained contemporaneously with the above STIS spectroscopy observation, and we use these in combination with the spectroscopic data to create a 2D mass map. We discuss the creation of point-spread functions (PSFs) for the HSTdata in Section 2.4.

Gemini NIFS Data
NGC 404 was observed with Gemini/NIFS on 2008 September 21-22 using the Altair laser guide star system. The nucleus was observed for a total of 4560s (see S10 for details). These data were used to derive stellar kinematics (from the CO band-head region) and the molecular hydrogen kinematics (from the 2.12 μm H 2 1-0S(1) emission line) by S10; we use these data in the stellar and gas dynamical modeling presented in this work in Section 4 and Section 5, respectively.
We briefly review the derivation of the stellar kinematics here; more details are found in S10 and Seth et al. (2008b). We use the penalized pixel fitting (pPXF) code of Cappellari & Emsellem (2004) to measure the stellar line of sight velocity distribution (LOSVD) including the radial velocity, velocity dispersion, and the non-Gaussian terms h 3 and h 4 , which measure the skewness and kurtosis of the LOSVD (van der Marel & Franx 1993). We fit the strong CO bandheads in the wavelength range from 22850 to 23900 Å. Kinematics were derived using eight spectra from the templates of Wallace & Hinkle (1996); these include spectral types from G to M (which show significant CO absorption) and classes from the main sequence to supergiants. We measured the instrumental resolution of each spatial pixel using sky lines. The LOSVD errors were estimated via Monte Carlo simulations using a propagated noise spectrum. Errors in the radial velocities ranged from 0.5 to 6km s −1 changing with S/N (see Figure 8 of S10).
One feature of this data set that is important in our analysis is the assumed center of the NGC404 NSC. S10 used both a dust-emission corrected photocenter at R.A.=01 h 09 m 26 999 and Decl.=35°43 ¢ 04 91) for modeling the gas kinematics and a kinematic center at R.A.=01 h 09 m 26 999 and Decl.=35°4 3 ¢ 04 . 89 for modeling the stellar kinematics. The stellar kinematic center is about half a pixel (∼0 025) south of the photocenter. The uncertainties on these measurements are ∼0 02. We will show in Section 5 that this uncertainty in center translates into large variations in our gas dynamical BH mass estimates, suggesting the gas dynamics results are not robust.

PSF Determination
The different PSFs for our HSTimages at different filters and our Gemini/NIFS spectra is a complicating factor in this analysis. The HSTPSFs play a significant role in creating the mass map of the nucleus while the Gemini/NIFS PSF is critical for the dynamical modeling. In this work, we use two different types of PSFs: (1) for the HST/WFC3 data, we use Tiny Tim PSFs (Krist 1995;Krist et al. 2011), and (2) for the Gemini/ NIFS data we use a two-component inner Gaussian + outer Moffat profile PSF presented in S10.
For the HST/WFC3 PSFs, we create the model PSF for each WFC3 exposure using the Tiny Tim routine for each of the four individual flt exposures. The PSFs were created using the tiny3 task, which includes a charge diffusion kernel and the distortion of a PSF. We then insert these into mock flt images at the position of the nucleus in each individual exposure to simulate our observations. Finally, for each filter, we combine the four PSFs at each of the four dither positions into the final PSF using the Astrodrizzle package (Avila et al. 2012).
For the Gemini/NIFS images, we use the two component PSF fit to the NIFS data from S10. The fits are made by convolving an HSTimage of the NGC404 nucleus to match the NIFS continuum image using a two component PSF: (1) an inner Gaussian component, and (2) a larger scale Moffat profile ( ( ) [( ( ) )] ) S = S + r rr 1 d 0 2 4.765 ) whose size is constrained by fits to the telluric calibrators taken alongside the science images. S10 find the best fit model PSF with an inner Gaussian full width at half maximum (FWHM) of ∼0 12, and an outer Moffat with FWHM of ∼0  . 95 profiles; each containing about half of the light.

Astrometry
Astrometry is an essential step in this work. Below we outline the astrometric alignments we carried out: 1. Due to the small field of view of our WFC3 data, we obtain absolute astrometry for the images by aligning an ACS F814Wimage in which the NSC is saturated to the 2MASS point source catalog. The alignment uses 12 unsaturated sources with a root mean squared offset of  0. 20; this represents our absolute astrometric accuracy. We then align our WFC3 F814Wimage to this image using bright compact stars around the core. 2. Astrometric alignment of NIFS, WFPC2, ACS/HRC, and the new WFC3 data were then tied to the WFC3 F814Wdata. The centroid position of the nucleus in WFC3 F814Wwas used as the reference position to align each image. Although dust clearly affects the area around the nucleus, it appears that the center of the nucleus does not suffer from significant internal extinction as seen in the F547M-F814Wcolor map (Figure 2 of S10) and the F336W-F814Wcolor map (this work, Figure 5). The alignment on the nucleus can be verified by comparing the alignment of sources in these color maps with the alignment of the NIFS Brγ map and the HSTHα image ( Figure 6 in S10). 3. Finally, we align the STIS spectroscopy to the WFC3 image. To do this, we collapse the spectra to create a 1D image of the slit, multiplying by the appropriate filter curve to make synthetic F547Mand F336Wimages. We then compare these 1D images to the astrometrically aligned WFC3 images in the same filters to obtain WCS information on the slit positions. Because of the low S/N in the STIS image at large radius, the fitting of the 1D images was performed at radii <  0. 8. Both image comparisons gave consistent WCS coordinates.

Constraining the AGN Continuum Contribution through Variability
In this section, we describe the procedure to characterize the possible AGN continuum contribution to the nuclear spectrum through examining the time variability of the nucleus in different bands. Nuclear variability has been suggested by several previous observations (Maoz et al. 2005;Seth et al. 2010), but here we provide the strongest evidence yet for broad-band nuclear variability in NGC404. This variability is important both in providing additional evidence for an accreting BH in NGC 404, and for constraining our stellar M/L modeling.
The most compelling measurements to date have been the UV variability presented by Maoz et al. (2005). Comparing their 2002 HST/ACS HRC UV observations to 1994 HST/FOS spectra with an  0. 86 aperture (Maoz et al. 1998) and 1993 precorrection HST/FOC imaging (Maoz et al. 2005), they suggest that the nuclear UV flux at 2500 Åin 2002 was a factor of ∼3 smaller than in the 1993-94 observations; this drop is consistent with the possible AGN continuum visible in their FOS spectra. Because that work had a number of similar data sets where fading was not seen, they argue that this drop in flux is robust despite the very different data sets and large apertures used in the comparison. Variable hot dust emission in the NIR was suggested by S10 based on the comparison of the 2008 NIFS K-band images with 1998 HST/NICMOS NIC2 observations in F160W. Within the central  0. 2, the NIFS K-band image is 30%-40% fainter than the NIC2 observations. This picture is consistent with the observed spectral signature of hot dust emission in the nucleus, but the comparison across bands and instruments leads to significant uncertainties (S10). The observations we present below provide more robust evidence for nuclear variability than any previous sets of observations.
We examine the variability of the nucleus using the new WFC3 images (all taken 2012 November 18) along with images in similar filters from WFPC2 and ACS/HRC ( Table 1). There are repeat measurements in three filters: (1) our WFC3 F336W image can be compared to a previous ACS/HRC F330W image taken 2002 October 28 (Maoz et al. 2005), (2) our WFC3 F547M image can be compared to previous WFPC2 images in F555W (1996 October 21) and F555W (1995 September 06), and (3) our WFC3 F814W image can be compared to a previous WFPC2 F814W image (1995 September 06).
To eliminate mismatches in the background level, we first estimated the sky fluxes in an annulus of  10. 0- 13. 0 away from the nucleus (the maximum radius available in all observations).
These sky levels were then subtracted off of all nuclear photometric measurements. Next, we performed aperture photometry in three radial ranges: the nucleus (<  0. 2), and two annuli surrounding the nucleus, from  0. 2- 0. 5 and  0. 5-1 0. The photometry was converted into luminosities using appropriate zero-points for each observation.
We compute the ratio of luminosities between two epochs to gauge the level of nuclear variability. Figure 1 shows the nuclear luminosities (<  0. 2) increase by 18%-25% in all three bands between 1994 and 2012. Specifically, nuclear luminosities increase ∼18% for F336W, ∼25% for F555W (F547M), and ∼24% for F814Wband. For each filter, we choose one epoch to be the reference epoch (filled circles in Figure 1) and compare the other epochs to the reference epoch. We use a  0. 2 aperture to capture a majority of the nuclear flux; the encircled energy within  0. 2 is 90%, 87%, and 85% for the WFC3 F336W, F547M, and F814Wfilters, respectively. The outer two annuli (triangles and plus signs in Figure 1) can then be used to gauge the variations that might be caused due to bandpass and detector sensitivity. We find variations of <3% at all epochs for these two annuli; we note that formal uncertainties are much smaller than this. Thus, the nuclear variations we find are highly significant (6-8 times the largest variations seen in our annuli), providing the strongest evidence to date for broadband UV-NIR variability in the NGC404 nucleus.
The increase in nuclear UV (F336W) flux we see from 2002-2012 is opposite to the decrease in 2500 ÅUV flux observed by Maoz et al. (2005Maoz et al. ( ) from 1993Maoz et al. ( -2002; this argues against a single transient event (e.g., supernova) being responsible for the variability. Given that the increase in flux at all wavelengths is opposite to apparent decrease in the NIR observed by S10 over overlapping time periods, this suggests that the variability likely occurs on timescales smaller than the decade scales probed here. Finally, we note that the STIS data analyzed below were taken contemporaneously with the WFC3 data. The variability we see here is consistent with the ∼17% contribution of the AGN continuum at 3700 Åthat provides the best-fit to the nuclear spectra (Section 3.2).

Stellar Population Synthesis Models of the Nucleus
We fit a range of single stellar population (SSP) models to the nuclear STIS spectroscopic data to determine the ages of stars andM/Lspatial variations within the NGC404 NSC. We show that the nuclear spectrum is somewhat better fit with the addition of an AGN continuum component, and use this as our default model in the nuclear regions. The stellar population of the NSC shows evidence of a range of ages, but an intermediate

SSP Model Fitting Methodology
We fit the nuclear STIS spectroscopy to two SSP models including (1) the Bruzual & Charlot (2003, BC03)

models, and
(2) the updated version of the BC03 models incorporating the asymptotic giant branch (AGB) models (Marigo & Girardi 2007;Marigo et al. 2008) obtained from Stéphane Charlot, which we refer to as CB07 models. These models include spectra for single age simple stellar populations over a grid of ages and metallicities at a spectral resolution similar to our observations. As in S10, the ages used were 1, 10, 50, 100, 300, 600, 1000, 2500, 5000, 10,000, 13,000 Myr. In the absence of knowledge about the age-metallicity relation, we assume a relation similar to that of M54, the nucleus of the Sgr dwarf galaxy (Siegel et al. 2007); the metallicity is assumed to be Solar while at the oldest age it drops to Z= 0.0004 ([M/H] = −1.7). We note that Bresolin (2013) find the present day metallicity in the outer disk of NGC404 is ∼80% Solar; given the BC03 models available and the likelihood of enhanced metallicity at the center of our galaxy, this justifies our use of Solar metallicity templates for younger ages.
We also correct for the poor wavelength calibration of the STELIB library used by BC03 and CB07, as identified by Koleva et al. (2008). To remove these issues, we compare 100 Åsections of the observed spectrum to the BC03 models to derive a velocity offset (relative to the systematic velocity of NGC 404). We find the velocity corrections to be±35 km s −1 . We then fit the BC03 and CB07 models to our STIS spectroscopy using the Simplefit program (Tremonti et al. 2004), which uses a Levenberg-Marquardt algorithm to find the best-fit parameters and scales of the input model spectra by performing a c 2 minimization. The program also calculates the extinction of the STIS spectroscopy by using the simple model for dust absorption using the prescription of Charlot & Fall (2000). We exclude regions around expected emission lines from the stellar population fit.
To optimize our fit, we fit the flux-calibrated spectrum between 3500 and 5700 Å. We choose this wavelength range by examining the c 2 of the nuclear fit (with and without an AGN component, see below) as we change the blue end of the fit where the data have a lower S/N. We vary the blue end of the fitted wavelengths from 3000 to 4500 Åand find that the optimal blue extreme wavelength is at 3500 Åfor both BC03 and CB07 SSP models. The best-fit for the BC03 templates had a reduced c 2 of 0.98 (670 degrees of freedom).

Nuclear Spectrum and Testing an AGN Continuum
Based on the evidence for variability we found in the previous section, it appears an accreting BH may contribute to the continuum flux near the center of the galaxy. We therefore test whether including a power-law AGN component provides Figure 1. Evidence of nuclear variability in several filters over a 15 year period. Open purple, blue, and red circles represent nuclear luminosities (<  0. 2) in the F336W, F547M, and F814Wfilters, respectively; each luminosity is presented as a ratio relative to the epoch shown with filled circles (L R ). The upside-down triangles and five-pointed star are the ratios of luminosities in two annuli surrounding the nucleus with radii of (0 2-0 5) and (0 5-1 0) respectively; these show much less variation than the nuclear flux, revealing that differences between filter responses between different cameras and sky subtraction issues affect the measurements below the 3% level. a better fit to the STIS data of the nucleus. We fit the nuclear spectrum (  0. 05) using the SSPs described both with and without an AGN continuum component: . To determine the best fit value of α, we fit the spectrum over the entire STIS wavelength range; α is varied from −4.0 to 4.0 in steps of 0.1, and the resulting is evaluated for each model. We find the best fit of a = -+ 0.5 ; 0.3 0.4 as shown in Figure 2, these values provide the best fit over the full wavelength range, and also in the UV. The left panel of Figure 2 shows the c 2 of the whole spectrum as well as the c 2 in just the UV part of the spectrum (l < 4000 Å) where the AGN is expected to contribute the largest fraction of the flux.
We then compare a fit with an AGN component with a = 0.5 to one without an AGN component and find that the inclusion of an AGN component provides a better fit. The residuals of these two fits are shown in the right panel of Figure 2; while both provide visually similar fits, some clear differences are seen. Most notably, the Balmer emission lines in the non-AGN case are stronger and show a similar strength at higher orders (an unphysical scenario). More quantitatively, the non-AGN Balmer line ratios are Hγ/ bH 0.48, Hδ/ bH 0.34, and  H / bH 0.49. In the AGN case, the higher order Balmer lines are weaker with Hγ/ bH 0.47, Hδ/ bH 0.24, and  H / bH 0.15. These ratios are consistent with Osterbrock (1989) (Case A recombination with T∼5000 K) which gives Hγ/Hβ∼0.458, Hδ/Hβ∼0.250, and H /Hβ∼0.153. We note however that similar line-ratios to the non-AGN fits are seen at larger radii where a power law component would not be expected (e.g., Figure 3, right panel).
We note that these emission lines were excluded from the fit and thus do not contribute to the c 2 value, but the left panels show the improved c 2 and residuals for the fit including an AGN component. To quantify how significant this difference is, we calculate the Akaike Information Criterion (Cavanaugh 1997;Burnham & Anderson 2002, p. 100) corrected for finite sample sizes (AICc). Given the best-fit c 2 value of 206 for the model with an AGN and c 2 of 209 for the model without an AGN, the AICc suggests that the AGN model is ∼5 times more probable than the no AGN model. While this alone is not compelling evidence for an AGN component, the somewhat better fits combined with the strong evidence for nuclear variability presented in the previous subsection lead us to include an AGN power-law component for our population synthesis fits at radii <  0. 2. Nuclear Line Emission: From the residual spectrum of the central three pixels (  0. 05 or 0.75 pc) of the STIS long-slit spectroscopy we fit the Balmer b H recombination emission line with a Gaussian and find a flux of ( ) 2.8 0.3 -10 16 erg s −1 cm −2 . In S10, the total nuclear Hβ flux in an aperture of ´ 1. 0 2. 3 was~´-1.08 10 13 erg s −1 cm −2 , thus the fraction of Hβ flux in the nuclear spectrum is only 0.2% of the total emission in the nuclear region.
From the nuclear b H flux we can estimate the nuclear a H flux to use for comparison to the X-ray luminosity and to estimate a nuclear star formation rate (SFR). We assume the Balmer decrement measured by Osterbrock (1989) of Hα/Hβ 10 37 erg s −1 . We compare the nuclear a L H to this X-ray luminosity to determine its consistency with AGN emission. From this Hα flux, we can predict the X-ray flux using the empirical relationship of Panessa et al. (2006), with revised fit by Nguyen et al. (2014). The observed X-ray luminosity is consistent with the predicted flux from this relationship . The uncertainty in the prediction here is very large but shows that the emission line signal we see is consistent with an AGN source to the X-ray emission. If we instead assume the emission is coming from star formation, we can use the a H luminosity ( ) = á L 7.8 0.1 H 10 35 erg s −1 to estimate an upper limit on the nuclear SFR.
Using the relation of Murphy et al. (2011), we find a nuclear SFR of ( ) ´-4.2 0.4 10 6  M yr −1 . This SFR translates into an expected radio luminosity at 5 GHz of ( ) 1.6 0.2 10 22 erg s −1 Hz −1 (Murphy et al. 2011); this is significantly lower than the observed total radio luminosity of 3.4 10 24 erg s −1 Hz −1 at the center of NGC 404 (K. Nyland et al. 2017, in preparation). The radio morphology shows a compact source consistent within astrometric errors of the photometric and kinematic center of our NIFS data; the unresolved fraction of the total flux is ∼75%, thus, this emission is still far too large to be related to the Hα emission via star formation. Thus, the nuclear Hα emission favors AGNrelated nuclear X-ray and radio emission.

Stellar Population Modeling of STIS Data
To obtain sufficient S/N (>20) for stellar population modeling, we bin the STIS spectrum along the slit. Near the center, the individual pixels (  0. 05) exceed this S/N threshold, but the outermost bins (at =  - r 0. 675 0. 825) extend over  0. 15. All of the bins analyzed here had S/N> 20 at wavelengths longer than 5000 Å. Bins with lower S/N appeared to have significant detector artifacts making stellar population fits unreliable. In this subsection, we focus on our best-fit results-these include an AGN component in the central  0. 2 and use the BC03 models. We discuss the uncertainties in our modeling in the next section.
The best-fit model and residuals to the central pixel are illustrated in the left panel of Figure 3. The model is shown in red with the individual SSP components shown in blue. The details of these stellar populations are shown in Tables 2 and 3. The fit in the central pixel is dominated by intermediate age stars with 70% of the light in the 1 Gyr SSP, 5% of the light in the 2.5 and 5 Gyr SSP, 17% of the light in AGN (see Table 2). Only a small fraction of the light is in younger populations The reduced c 2 is 0.98 for the central pixel and 2.23 for the outer bin; the decrease in the goodness-of-fit at larger radii is likely due to increased contributions from bad pixels. The details of SSP model fits to all the spectral bins are presented in Table 3.
The left panel of Figure 4 shows the light-weighted and mass-weighted ages along the slit. The mean light-weighted age is ∼2.5 Gyr and the mean mass-weighted age is ∼4 Gyr. A large fraction of the light appears to be coming from a 1 Gyr population across the slit. At radii   0. 3 (6 pc) this stellar population contributes ∼70% of the light and mass; this drops somewhat at larger radii (the right panel of Figure 4), continuing the trend seen at larger radii (Bouchard et al. 2010). This mass fraction is higher than the mass fraction of ∼1 Gyr found in the NSC by S10 using the same template fitting method using ground-based optical spectroscopy (see also Cid Fernandes et al. 2004), suggesting that this population is concentrated near the center of the NSC. This central region corresponds to the "extra-light" counter-rotating component observed in S10 and provides additional evidence that this portion of the NSC was formed from externally accreted gas during a merger ∼1 Gyr ago. This suggests that gas from minor mergers can effectively accrete into the very center of the galaxy to form an NSC.

Assessing Uncertainties in the Stellar Population Modeling
We estimate the uncertainty in theM/Lof each spatial bin using a Monte Carlo technique. Specifically, we add in Gaussian random errors to each spectrum and refit 100 times. We then take the standard deviation of the resulting M/L distribution as the error on the M/L in that bin. We find in the central bin, the M/L uncertainties are about 15%, while at the edges the bins haveM/Luncertainties of ∼30%. We propagate these M/Lerrors through the remainder of our analysis; these end up being the dominant error in the final mass maps we create. We now discuss the sources of systematic uncertainties in our stellar population models. The presence or absence of an AGN component significantly changes the youngest stellar populations, but has little effect on the derived M/L. As shown in Table 2, very young populations are inferred in the central pixel when no AGN is fit. However, as discussed above, the featureless AGN spectrum appears to provide a better fit than these models. We also note that the large 1 Gyr fraction is not strongly affected by our inclusion of an AGN component.
We also find that BC03 and CB07 models provide very similar fits because the extra TP-AGB stars present in the CB07 models do not contribute significantly to the spectra shortwards of 6000 Å. The light-and mass-weighted ages are only different by 3%-5% between the CB07 and BC03 fits. Because of this similarity, theM/Lratios in the I band vary minimally, with maximum differences of ∼5%.
To test our approach, we also fit our spectra with pPXF (Cappellari & Emsellem 2004) using the MILES models for a subset of 50 logarithmically spaced ages between 1 and 14 Gyr and seven metallicites from −2.32 to +0.22; this gives 350 model spectra (Vazdekis et al. 2010a(Vazdekis et al. , 2010b. We also include fits to gas emission lines. Because of a large number of templates, the resulting fit is highly non-unique; to handle this, we use regularization (following Onodera et al. 2012;Cappellari et al. 2013;McDermid et al. 2015). Despite the wider range of templates, we find the pPXF star formation histories (SFHs) also feature a dominant intermediate age (1-3 Gyr) population presented as the blue line of Figure 4. These similarities also suggest our assumed mass-metallicity relationship has not dramatically affected our inferred SFHs. Given the similarity of the fits, we proceed with our Simplefit results.

Color Correlation With Spectroscopic M/L
We examine the correlation between the WFC3 colors and STIS spectroscopic M/L, and use this to create a new color-M/L relation for the NGC 404 nucleus. This is a critical step in improving the BH mass estimate by more accurately modeling the mass distribution in the NGC404 nucleus.
We create three WFC3 color maps of F336W-F547M, F336W-F814W, and F547M-F814Wby taking the astrometrically aligned image pairs and cross-convolving each image with the PSF of the other (e.g., for the F336W-F547M color map, the F336W image was convoluted with the F547M PSF and vice versa). The cross-convolution is necessary to accurately assess color gradients, since different width PSFs can introduce spurious gradients near the center of galaxies. We then create color images using the Vega-based zero points and correcting for foreground extinction ( Schlegel et al. 1998). Figure 5 shows the WFC3 F336W-F814W (approximately U-I) color map of the nucleus of NGC 404. This map shows redder regions resulting from dust extinction on the eastern side of the nucleus while the western half of the nucleus appears to have little internal extinction with bluer areas in this region consistent with younger (500 Myr) populations. This color map is consistent with the WFPC2 F547M-F814Wcolor map in Figure 2 of S10 (which was not cross-convolved). Next, we match up the STIS slit to the WFC3 color images to enable comparison of the spectroscopicM/Ls with the broadband colors. The images were aligned with the STIS slit as described in Section 2.5. The white lines in Figure 5 show the location of the STIS slit on the F336W-F814Wcolor map. Over the region where spectroscopicM/Lvalues have been determined, the F336W-F814Wcolor varies by more than 1 mag due to dust and stellar population variations. To account for the spatial variation of M/Ldue to both stellar population and dust extinction, we use the effective M/L(M/L eff ) that includes dust extinction using the prescription of Charlot & Fall (2000) and the best-fit t V from our models. The I band M/L eff along the STIS slit is shown in the bottom panel of Figure 6, while the cross-convolved F336W-F814Wcolor along the slit is shown in the top panel. The M/L eff is highest in the reddened regions on the eastern side of the nucleus. Both the color and M/L eff show a minimum at the center, likely due to the contribution of an AGN continuum component. Figure 7 shows the correlation of the WFC3 F336W-F814Wcolor map with the spectroscopicM/L eff values. We fit this correlation to get a color-M/Lrelation; the best-fit is the red solid line. The idea behind this approach is the same as in Bell & de Jong (2001), Bell et al. (2003), and Zibetti et al. (2009), but appropriate to the exact populations present in the NGC404 nucleus as determined using the STIS population synthesis fits. We fit a linear relation between the logarithm effective spectralM/L eff and the F336W-F814Wcolor. The error in this relation is determined in two ways: first, we propagate the errors on the spectroscopicM/Ldetermined above, and second, we determine bootstrap errors of the fit by refitting using random sampling with replacement. The first method yields significantly larger errors than our bootstrapped errors, which are minimal. We calculate the 1σ uncertainty on our color-M/Lrelation from our total error budget, and show these as the thin blue (steeper) and yellow (shallower) lines in Figure 7, respectively. We will examine the effect of our color-M/Lrelation uncertainties on the dynamical models in Section 4. Figure 7 also compares our best-fit color-M/Lrelation to the one predicted by Bell et al. (2003) for the Sloan i band based on the Sloan u−i color. Even accounting for filter  transformations, the slope of the Bell et al. (2003) relation (purple thick solid thin line) is significantly flatter than our best-fit relation and predicts a much higher mass than our derived relation. We also plot a color-M/Lrelation calculated in our filters by Roediger & Courteau (2015, green line), also using BC03 models with a Chabrier IMF. This relation has a similar normalization (as expected given the similarities in the models used), but is significantly steeper than our relation. The differences in slope play the most critical role in affecting our dynamical models, as the overall normalization is scaled to fit the kinematics. The differences in slope of our relation from the Roediger & Courteau (2015) and the Bell et al. (2003) relation are likely due to the unusual SFH in the NGC404 nucleus, and suggest that knowing the local SFH provides us with important information in mapping out the stellar mass distribution in the nucleus.
As we discussed in Section 3.2.2, the fraction of young stars in our fits near the center of the galaxy depends significantly on the allowed contribution of the AGN (∼17%) to the spectrum. To quantify the effect this has on our M/L, we re-fit the color-M/Lrelation using the non-AGN fits for the central spectral bins (thin black line in Figure 7). The M/L  s from these fits are on average 7% larger than fits including the AGN (column 17 in Table 3), and this leads to a slightly shallower slope in the color-M/Lrelation, but the difference is within our 1σ uncertainties.  Note. Best-fit stellar population models of binned STIS spectrum. Column 1: radii included in bin; particularly, the five central most data points are single pixels, while the bins at larger radii combine multiple pixels. We therefore use the notation of±(r 1 -r 2 )″ to specify the size of those spectra bins. Column 2: l (light fraction) and m (mass fraction) for each best-fit model. Column 3: AGN component; ×means no AGN component was included in the fit. Columns 3-12: SSP model age light and mass fractions. Column 13: Attenuation of starlight by dust. Columns 14-15: M/L I for each spectral fit using the BC03 and CB07 SSP models. Column 16: reduced c 2 of each bin associated with BC03 SSP model. Column 17: the best-fit M/L  assuming no AGN contribution for the five central spectral bins.

Creating a Mass Map and Mass Model
We describe the creation of a mass map and model for the NGC404 nucleus in this section. The first step in this process is to produce anM/L eff map for the nucleus by applying an M/L eff -color relation to the color map. The M/L eff map in F814Wis shown in the left panel of Figure 8. This was created using the fitted correlation of the F814WM/L eff versus F336W-F814W.
It is straightforward to obtain the mass map for the NGC404 nucleus by multiplying theM/L eff map with the F814W luminosity map pixel by pixel. The right panel of Figure 8 shows the WFC3 F814Wmass map of the nucleus of NGC404. Because the mass map is a weighted version of the F814Wluminosity map, its PSF should be well described by the F814WPSF. Comparison of the white (mass map) contours with the red (luminosity) contours shows that the mass distribution in the nucleus is more symmetric than the F814Wlight emission profile. This is because theM/L eff map is corrected for dust extinction on the eastern side of the nucleus. We note that the central M/L eff value from spectroscopy is very close to the best-fit relation, and thus we make no special correction to the map due to the presence of the AGN. The mass profile of the central regions of NGC404 is shown in Figure 10.
To create a mass model that can be used in dynamical modeling, we need to deconvolve the effects of the PSF on the mass map. We do this using a GALFIT model with three Sérsic components, the inner two to fit the complicated NSC, and the outer to fit the galaxy bulge. Because our mass map extends out to only 25″, we constrain the shape of the outermost bulge Sérsic component to the values derived by S10: Sérsic index n 3 =2.5, r eff, 3 =45″ (675 pc), and position angle P.A.=80°, while leaving the normalization and flattening of this component as free parameters. These parameters are then fit to the 2D mass map along with all parameters of the inner two Sérsic profiles using GALFIT. The best-fit parameters along with error estimates propagated from the M/Lerrors of the color-M/Lrelation are shown in Table 4. The mass map, mass model, and residuals are shown in Figure 9. The contours of mass density in each panel show both data (black) and model (white) at the same densities to highlight the regions of agreement and disagreement between data and model. These show that our mass model is more symmetric than our luminosity model. Our color-M/Lrelation has yielded a fairly symmetric mass distribution by up-weighting the M/Lin the dusty regions east (left) of the nucleus, and down-weighting the M/L where there are young stellar populations to the west (right) of the nucleus. This increased symmetry shows the success of our color-M/L relation. Figure 10 shows the 1D radial mass profile of the mass map versusthe GALFIT model with the three components shown individually; the residuals in the bottom panel are <10% over the inner region. This is similar to the level of residuals in the surface brightness fit presented in S10. Both model and data profiles were determined using the IRAF task ellipse (Jedrzejewski 1987).
To incorporate this mass model into our dynamical modeling, we need to create a multi-Gaussian expansion (MGE) of the model (Emsellem et al. 1994). We do this by decomposing the GALFIT model into individual MGEs using the MGE_FIT_1D code by Cappellari (2002); the result is shown in Table 5. We obtain very similar results by parametrizing the PSF using an MGE and then directly fitting the 2D mass map, but prefer fitting the GALFIT model because of the superior handling of the PSF (it allows one to input the actual PSF as a 2D image). We fit our MGE out to ∼25″.
We compare our mass MGE to the mass MGE created in S10 from the surface brightness profile without anyM/Lvariations in Figure 11. The two mass profiles are consistent with each other to within 7% between  0. 27 and  25. 00 (4-375 pc). However, at the very center, the new model is about 10% lighter than the original best-fit mass model.
To propagate the error in the color-M/Lrelation into our mass maps, we also create mass maps using the 1σ uncertainties on the color-M/Lrelation. We also created mass maps from other color-M/Lrelations for other filters (i.e., in F547M and F336W). In general, we find similar results from all maps except for those that we calculate an F336WM/L. In these cases, excess F336Wlight at the center is not fully compensated for by the color-M/Lrelations and excess mass is inferred due to the AGN (see column 7 of Table 6).We will discuss the effects these uncertainties have on our dynamical models in Section 4.3.

Stellar Dynamical Modeling
In this section, we present Jeans modeling of the central BH mass of NGC404 using the CO bandhead stellar kinematics (Section 2.3) and incorporating the mass surface density map developed in the previous section. -and mass-weighted mean stellar age of the NSC vs. radius as derived from model fits to the STIS spectra. Right panel: the light-and mass-fractions of the 1 Gyr population in each of these fits; this population dominates the nuclear region, but contributes less at larger radii (S10). The pPXF light fraction results are also presented to demonstrate the level of consistency with our BC03 models.

Stellar Kinematics
We use the Jeans anisotropic modeling (JAM) method and IDL software 8 as the dynamical model to calculate the mass of the central BH, M BH . The model relies on an axisymmetric solution of the Jeans equations incorporating orbital anisotropy (Cappellari 2008). The anisotropy is characterized by a single anisotropy parameter: b s s = -1 z z R 2 2 , where s R and s z are the velocity dispersion in the radial direction and z-direction in ellipsoid aligned cylindrical coordinates. The model calculates the predicted second velocity moment, s = + V V rms 2 2 , where V is the mean stellar velocity and σ is the velocity dispersion based on an MGE model. We note we are using a mass model that differs from the luminosity profile of the galaxy; this difference is incorporated into the JAM model by using separate luminosity and mass profile MGEs; however both are axisymmetric and thus may not fully capture the variations in kinematics due to e.g.,dusty regions. We also note that because of the proximity of NGC404, the kinematic maps have some contribution from partially resolved stars that makes the dynamical maps appear less smooth, especially at larger radii. The model has four parameters: (1) the mass of a central BH, M BH , (2) the mass scaling factor γ; while for a normal JAM fit, this would be the dynamical M/L, in our case this parameter is the ratio between the dynamical M/Land the M/Lpredicted by our stellar population fitting (which assumes a Chabrier IMF), (3) the anisotropy, b z , and (4) the inclination angle, i.
The models were run over a grid in these parameters, the M BH range is 3×10 4  M -3×10 6  M and is gridded in step of D log M BH =0.1, γ is gridded in steps of 0.015 between 0.50 and 2.25, b z ranges from −1 to +1 with a step of 0.05, and   Figure 7). 8 Available from http://purl.org/cappellari/software. the inclination runs from the minimum value allowed by our MGEs, i=17°up to 31°. 5 in step of 0°.5. We compute the models on a grid of b z , M BH , γ, and i. At each grid point we evaluate the c 2 of the the predicted V rms relative to the kinematic data. We run the dynamical modeling to fit these four parameters (b z , M BH , γ, i) in 2D using 920 kinematic data points (degrees of freedom, dof) within a radius of~ 1. 5 of the center. Figure 13 shows c 2 contours as a function of M BH versus the anisotropy parameter b z (left), scaling factor γ (middle), and inclination angle i (right) after marginalizing over the other two parameters. The minimum reduced c~1.26 r 2 is found at M BH =7.0 10 4  M , b~0.05 z (i.e., nearly isotropic), g = 0.890, and i=20°. The solid contours show the s 1 , s 2 (white), and s 3 (red) levels (or c D = 2.30 2 , 6.18, and 11.83) for two parameters; we choose the confidence limits for two parameters to accurately capture the uncertainties shown in our two parameter plots above after marginalizing over the third and the fourth parameter in each plot. The BH is only detected at the 1σ level. Given the restrictions JAM models place on the orbital freedom of the system (relative to e.g., Schwarzschild models), it is common to use 3σ limits in quoting BH masses (see Section 4.3.1 for more details). Therefore, we use the 3σ upper limit of M BH <1.5 10 5  M . This is similar to (but slightly larger than) the upper limit of the light-follows-mass model in S10 (using a c D = 9 2 limit). Figure 12 shows the 1D V rms versus JAM predictions for our axisymmetric mass model with BHs with different masses. The V rms values shown are bi-weight averaged over circular annuli. These data are compared with lines for different BHs; these show that BHs with M BH 10 5  M do not provide a good fit to the data. Particularly, the innermost bin looks consistent (within the errors) with the 3×10 5  M model. Beyond ∼0 3, the data look consistent with anything <3×10 5  M , and beyond~ 0. 7 there is little difference between the models.
The most significant change in comparison with S10 is the anisotropy parameter, b z . Our best model is more isotropic 0.05 ) than the best fit of b = 0.5 z found in S10. Given the observed isotropy in other galaxy nuclei (e.g., Schödel et al. 2009;Seth et al. 2014), we interpret this as a sign of the success of our mass model (which incorporates variations in M/L eff ) correctly representing the mass distribution in this system. This analysis does not include any gas mass in the nucleus; we show here that we expect this assumption to have a minimal effect. H I gas is present at large radii in NGC404 but has a hole at the center (del Río et al. 2004). There is CO emission from the regions in and around the nucleus: assuming a distance of D=3.06 Mpc, the total molecular gas mass is estimated to be 6×10 6  M  (Wiklind & Henkel 1990). Most of this gas is to the NE of the nucleus, and only a few percent is within the central 10″, thus the amount within the region we are modeling is 10 5  M . This gas mass is less than 1% of the stellar mass in the region we are modeling the kinematics, and based on the molecular hydrogen emission map (Section 5), this gas is likely spread widely across the nucleus; thus we do not expect this component to affect our M/Lor BH mass estimates. We examine the kinematics from the warm molecular hydrogen emission in Section 5.

Uncertainties Due to Mass Models
The confidence intervals in our analysis above are based on the kinematic measurement errors and do not include any systematic uncertainties in the mass model. In this section, we examine the mass model uncertainties by analyzing additional, independent mass model images.
The primary uncertainty in M BH comes from the central stellar mass profile. We can get a rough uncertainty on the BH mass by examining the uncertainty in the total mass in the central pixel of our mass map. The central pixel in our mass map has a projected stellar mass of5.1 10 5  M (with an area of 1.8 pc 2 ); the dominant uncertainty in this photometric estimate of the central mass is the uncertainty in the central M/L eff . This uncertainty is 20% or ∼1×10 5  M . We can expect our BH mass uncertainty to be comparable to the uncertainty in the mass in this central pixel. We now examine the systematic uncertainty due to our mass map using several methods. Overall, we find that the systematic irrors on our BH mass are all consistent with the 3σ upper limit of 1.5 10 5  M .

Color-M/L Relation Errors
We calculated the 1σ uncertainties on our color-M/L relation, shown as blue and yellow lines in Figure 7. We created mass maps and MGEs from these 1σ steeper and shallower color-M/Lrelations and ran a full grid of JAM models for both. Figure 13 show the resulting 3σ confidence levels. The effect on the BH mass from these models is Figure 7. The mass-to-light ratio-color relation for the NGC404 nucleus. The yaxis shows the effective mass-to-light ratio in F814W(M/Leff) determined from stellar population fits to the STIS spectroscopy, while the x-axis shows the F336W-F814Wcolor determined from WFC3 imaging. Black points show the data for the stellar population fits using the BC03 models, while the thick red line shows the best-fit linear relation to these data. The 1σ uncertainties in the slope of our best-fit linear relation are shown with thin blue (steeper) and yellow (shallower) lines. The black thin line shows the best-fit linear relation to the data when we do not include an AGN component in the five innermost spectral bins during the fitting procedure. Error bars for each point in log(M/Leff ) were determined through a Monte Carlo analysis of the stellar population fits and these errors form the dominant error in our best-fit log(M/Leff ) color relations. The purple thick solid line is the predicted color-M/Lcorrelation from Bell et al. (2003) shifted downwards by 0.3 dex, while the green line shows the Roediger & Courteau (2015) relation. The red data point indicates the NGC 404 center. minimal, probably because even with the errors, the color-M/L relations are quite similar at the bluer colors found near the nucleus. However, the larger changes in γ and i show that these parameters are quite sensitive to the assumed color-M/L relation, and that our formal errors do not capture the full extent of the uncertainty in these parameters.
As noted, the 1σ errors in our relation still do not encompass the color-M/Lrelations derived in previous work. To test the effect of these more extreme color-M/Lrelations, we also derived the stellar mass map based on the Bell et al. (2003) relation (the purple thick solid line in Figure 7) to examine how much the slope in the color-M/Lrelation affects the constraints on the BH mass. The resulting best-fit BH mass is lower, M BH ,Bell03 =3.3 10 4  M , but the most notable difference is the mass scaling factor of g = 0.33; Bell03 this factor of ∼3 is expected due to the higher normalization of the Bell et al. (2003) relation.
We also fit our source using the color-M/Lrelation derived without including an AGN component in the fit. As noted in Section3.3,the central M/L  and NSC mass increase by 7%, which suggests we will derive a less massive BH at the center of NGC404. Using this non-AGN color-M/Lrelation to create our mass model, we get JAM models with a best-fit BH mass of M BH  = 6.3×10 4  M and g = 0.95 fixing the anisotropy to b = 0.05 z and inclination i=20°. Therefore, the assumption of an AGN contribution has minimal effect on our results, with a best-fit BH mass 9% higher than the non-AGN case.
To summarize, the errors in our color-M/Lrelation suggest minimal changes in our best-fit BH mass, and all reasonable models are consistent with our 3σ BH mass upper limit of 1.5 10 5  M . However, substantial changes in the best-fit γ and inclination are seen using different color-M/Lrelations.

Mass Maps from Other Filters
Our default mass model is created using the F336W-F814Wcolor map and the F814Wimage. As another way of gauging the uncertainty in the central mass profile, we also created mass maps using alternative wavelength images and M/L eff versuscolor relations. We then created new MGEs from GALFIT model fits to these maps, and repeated the JAM modeling. We worked on three color maps of F336W-F814W, F547M-F814W, and F336W-F547M, then created an STIS M/L eff versuscolor relation as described in Section3.3,For each color, we created an M/L eff map and mass map for the F336W, F547M, and F814Wfilter. Therefore, in total, we created nine WFC3 mass maps and their corresponding GALFIT model mass maps. We present the best-fit JAM models generated from these mass maps in Table 6.
When using the F547Mand F814Wimages as the basis for our mass maps, our results are fully consistent with the results we obtain for our default assumptions, suggesting that our mass model uncertainties are lower than the uncertainties from the kinematic fitting. However, when using the F336Wimages to make the maps, the excess emission at the center increases the stellar mass density and decreases the best-fit M BH . This effect is likely due to AGN emission and is still only at the 2σ level.
To test any possible systematic errors caused by our GALFIT model, we also directly fit the mass map with an MGE model. This makes fewer assumptions about the shape of the mass profile but requires that we approximate the PSF as an MGE as well. This MGE fit results in a best-fit M BH =8.0 10 4  M (14% higher than the GALFIT model) with the same b z , γ, and i values.

IMF Variations
In calculating our stellar population M/L, we assume a Chabrier IMF, and the best-fit γ of 0.890 suggests our overall IMF cannot be too different from this. Furthermore, Figure 12 shows that radially, the deviations of V rms from the predicted model are at most ∼5%, suggesting radial variations in the mass scaling factor (which depends on V rms 2 ) are at 10%. Nonetheless, variation of the IMF within the nucleus could affect our BH mass estimate. While there is evidence for IMF variations at the centers of elliptical galaxies (Cappellari et al. 2012;Conroy & van Dokkum 2012;Martín-Navarro et al. 2015), there is little knowledge of IMF variations on the parsec scales probed here. The only direct information on the IMF in NSCs comes from our own Milky Way, and at times the IMF near the center of the Galaxy has been suggested to be very shallow at the high-mass end relative to standard IMFs (e.g., Bartko et al. 2010). However, the more recent work of Lu et al. (2013) based on AO observations of individual stars suggests a modestly shallower IMF for young stars in the Milky Way NSCs of µ - dN dM M 1.7 0.2 . We note that for older stellar populations, both shallower and steeper IMFs increase the M/L(e.g., Cappellari et al. 2012). Given the low inferred BH mass, there is little room for such an IMF at the center of NGC404.

Additional Sources of Uncertainty
In this section, we discuss additional sources of uncertainties due to our dynamical modeling techniques and the assumed center.

Dynamical Modeling Uncertainties
We have chosen to dynamically model our stellar kinematics with JAMs (Cappellari 2008), which provide good fits to real ETGs on large scales (Cappellari 2016). These models make strong assumptions on the form of orbital anisotropy and have less orbital freedom relative to Schwarzschild models (e.g., van den Bosch et al. 2008). Particularly, JAM models predict only the second moment of the LOSVD, V rms , which is known to provide BH mass constraints that are degenerate with anisotropy (e.g., Binney & Mamon 1982;Mazzalay et al. 2016) as seen in Figure 13. The JAM models incorporate an anisotropy parameter, but this parameter still imposes strong constraints on the allowed orbits which can make the resulting parameter errors too small. Furthermore, we have assumed a constant anisotropy. Despite these shortcomings, our models likely do accurately represent the NSC in NGC404, as the anisotropy has been found to be small and fairly constant in the Milky Way, M32, and CenA (Verolme et al. 2002;Cappellari et al. 2009;Schödel et al. 2009), as well as the more distant M60-UCD1 .
JAM models give BH mass estimates consistent with Schwarzschild models (Cappellari et al. 2009Seth et al. 2014;Thater et al. 2016) and high precision maser disk measurements (Drehmer et al. 2015). Schwarzschild models, which include all physical orbits of stars within a given potential, do have larger, more realistic, uncertainties. Seth et al. (2014) find errors in BH mass from M60-UCD1 from JAM models that are 2-3 times smaller than the errors from the Schwarzschild models.

Central Position Uncertainties
We discuss the uncertainties caused by our assumption of the dynamical center position of the galaxy. The choice of the center can have a large effect on BH mass determinations (e.g., Jalali et al. 2012), and appears to have a strong effect on our gas kinematic models (Section 5). We have determined both a kinematic and photometric center; these centers are offset by about a half a pixel, or ∼0.4pc. Our default stellar dynamical model assumes the kinematic center; when we run the JAM model on the photometric center, we get a best fit BH mass of ∼5.6×10 4  M . This suggests a ∼20% systematic error, similar to the 1σ error bars. We also test to ensure that our mass map extends radially far enough out; we remake our MGEs from mass maps extending out to different radii and find that the results are stable if the mass map has a radius that is larger than 6″. Using an MGE that does not go out as far results in a systematic increase in the BH mass; this increase is ∼30% if we narrow down the mass map radius to  1. 5. To conclude, our constraints on the BH mass appear to be robust with little evidence in our tests for significant systematic errors. Thus, we present our M BH <1.5 10 5  M as a robust 3σ upper limit.

Gas Kinematics Modeling
To independently constrain theM/Lof the nucleus and the mass of the central BH, we model the kinematics of the molecular hydrogen gas. A gas dynamical model of NGC 404 was also constructed by S10. Our models use the same data, but we make some different assumptions to test the robustness of the measured BH mass. We find the models are extremely sensitive to our choice of center, and furthermore, that the gas motions near the center suggest the gas is either distant from or not in rotation around the nucleus. We therefore suggest the NIR emission from molecular hydrogen gas cannot be used to constrain the BH mass in this object.
For the stellar potential, we use the MGE model derived in Section 3.4. As with the JAM models presented above, the deconvolved stellar mass model is multiplied with an additional scaling inM/L. The BH is modeled as a point source.  Notes. GALFIT mass map model parameters of WFC3 F814Wmass map. Columns 1 and 2: component name and its label (see Figure 10). Column 3: component index profile. Columns 4 and 5: effective radius or half-light (half-mass in the case of mass map) of the component profile in pc and ″, respectively. Column 6: position angle. Column 7: ratio of major axis and minor axis. Column 8: mass of each component. a Parameter values without error bars mean that we fix those parameters during GALFIT.

Thin Disk Tilted Ring Models
We model the H 2 1-0 S(1) transition kinematics with a tilted ring model, similar to the modeling approach used in S10. In summary, our models assume that the H 2 emitting gas is rotating in thin rings on circular orbits around the center of the nucleus of NGC 404. At each radius, the velocity of the gas traces the enclosed mass in the spherical symmetry approximation (see e.g., Neumayer et al. 2007) Each ring of gas can have its own inclination and position angle. We constrain these parameters for rings in the outer parts by fitting ellipses to the velocity field with KINEMETRY (Krajnović et al. 2006). Within  0. 1 of the center of the nucleus, where the PSF is more important than in the outer parts, we optimize for the best values. As was found by S10, there is a kinematic twist close to the center, where the position angle (P.A.) changes by 70 • .
For our model we generate a spectral cube with the same spectral sampling as the NIFS IFU, but spatially oversampled by a factor of 10. For each ring, we distribute its flux over the cube according to its spatial distribution and the velocity along the ring to mimic our observations. We also spectrally convolve the flux of each ring by a Gaussian with a width of 20kms −1 to mimic the intrinsic dispersion of the disk. We then convolve each spectral slice with the oversampled NIFS PSF and resample the cube in order to compare it with our data.
Contrary to S10, we fit our dynamical models directly to the kinematic data without first symmetrizing them to enable more reliable data versus model comparisons. We calculate the c 2 for both the predicted velocity field and the predicted   Table 4). The radial mass surface density profiles for both data and model are extracted using the IRAF task ellipse. Bottom panel: percentage difference between the radial mass surface density profile and its GALFIT model.  Note. MGE fit for our best-fit model, created from a GALFIT model of the mass map shown in Figure 9. Column 1: component number. Column 2: central surface density (Gaussian amplitude). Column 3: Gaussian width (in arcseconds) along the major axis. Column 4: axial ratio.
dispersion field. However, since the measured velocity field is more constraining than the dispersion field, we use only the c 2 of the velocity field to determine our best-fit model. To optimize our fit for the BH mass, we follow S10 in only using the central 12×12 pixels for our c 2 determination. Using a wider field for the calculation of c 2 gives qualitatively the same results. S10 modeled the surface brightness of the H 2 gas as two exponential disks. In reality, the distribution of the gas is very irregular, as is visible in the H 2 intensity contours in Figure 14. We therefore deconvolve the line flux derived from the IFU data of the H 2 gas by fitting a number of Gaussians to the peaks in emission close to the center with GALFIT. This different flux model leads to slightly different gas dynamical models compared to those of S10. However, this improvement in the modeling did only marginally change the best-fit BH mass.

Model Grid
S10 found a best-fit inclination of 37°for the gas disk. We allow the inclination to vary between 17°and 47°, in steps of 5°. As with the stellar population models, we include an additional mass scaling factor G G SP , which was allowed to vary between 1.0 and 1.5 in steps of 0.05. We used BH masses between (0-)4.5 10 5  M in steps of2.5 10 4  M . The exact location of the center of NGC404 is uncertain. The photometric center, which was derived in S10 by correcting the Gemini/NIFS continuum emission for the emission of hot dust, is offset by 0.37 pc (  = 0. 025 0.5 pixels) from the kinematic center. We therefore run our grid of models for two different centers. Figure 15 shows the results of H 2 1-0 S(1) gas kinematics modeling. When using the kinematic center, as was done by S10, we derive a BH with mass ∼-+ 2.0 10 0.75

Results
This is two times lower than the gas dynamical mass estimate found in S10, but within the errors of that previous determination. However, when using the photometric center we find that we cannot dynamically confirm the presence of a BH, i.e., our models are consistent with no BH. The best-fit mass scaling factor of the cluster is 1.325±0.075. This is significantly higher than the mass scaling factor from the stellar dynamical models ( -+ 0.890 0.060 0.045 ). The reduced c r 2 of our fits are high: c = 109 r 2 for the kinematic center, and c = 164 r 2 for the photometric center. Although it is possible that we have underestimated the uncertainties on the velocity data, the most likely contribution to the high c r 2 is systematic; the models simply do not provide a good fit to the data. In particular, large residuals in velocity are found close to the center of the cluster (Figure 14). It is not possible to remove this (positive) velocity peak by shifting the center. We note that this residual peak was also visible in the residual map of S10 (see their Figure 16).
Since shifting the center of our models by 0.4 pc (half a pixel) has a huge influence on the c r 2 and can either introduce or take away the necessity of a BH in our models, we do not consider the 2×10 5  M for the mass of the BH a significant result, especially since the peak in the dispersion map of the H 2 gas does not coincide with either the kinematic or photometric center. Fitting the H 2 gas with two disks, one with position angle (P.A.) of 50°and one with a P.A. of - 20 can reproduce qualitatively some of the features seen in both the velocity map (the double-lobed structure) and the off centered peak seen in the dispersion map. However, we find that it is still not possible to model the positive central peak in velocity. It is likely that part of the observed motion of the H 2 gas is therefore not tracing the potential, and might be either due to infalling or outflowing gas, or foreground gas at larger galactocentric radii. This may also explain the mismatch in the mass scaling factor between the best-fit gas dynamical and stellar dynamical models.

The NGC404 BH
Previous work has already provided strong evidence for the presence of an accreting BH in the nucleus of NGC404, and our observations of nuclear variability (Section 3.1) and blue continuum in the STIS spectrum (Section 3.2) have added to this evidence. Our dynamical constraints provide a robust upper limit of 1.5×10 5  M on this BH. We note that our upper limit makes the NGC404 BH consistent with the radio non-detection of a compact core in NGC404 by Paragi et al. (2014); they place a ∼3×10 5  M upper mass limit on the BH based on fundamental plane measurements. NGC404 is thus unique; it contains an MBH whose mass is dynamically constrained to be 10 5  M . This alone can provide some evidence on the mechanism that formed BHs in the early universe (e.g., Volonteri 2010; Greene 2012). The dynamical upper limits on possible BHs in M33 and NGC205 are an order of magnitude lower than for NGC404 (Gebhardt et al. 2001;Merritt & Ferrarese 2001;Valluri et al. 2005), however, there is no evidence that BHs exist in either of these objects. The lowest mass BH with a dynamical measurement is in NGC4395 (4×10 5  M ; den Brok et al. 2015). However, some similar BH estimates do exist for very faint galaxies detected as broad-line AGNs  . Top panel: comparison of the final MGE model constructed from our nuclear mass map multiplied by our best-fit mass-scaling factor (g = 0.890, purple line; see Section 4 for details) to that of S10 (red dash line) who assumed a constantM/L; the mass is obtained by multiplying by the S10 light MGE by their best-fitM/Lof 0.70. Bottom panel: ratio of the two mass maps. Note. Exploring the systemic uncertainty in our dynamical models using mass maps created with different filters and colors. Column 1: number of nuclear mass map. Column 2: color used to create the color-M/Lrelation. Column 3: HSTfilter used in creation of the mass map. Column 4: method used to create MGE; Galfit means that mass maps were fitted to a Galfit model which was then fit to an MGE model, "Direct" means that the MGE was created directly from the mass map. Columns 5, 6, 7, and 8: best-fit BH mass (M BH ), anisotropy (b z ), scaling factor (γ), and inclination angle (i) of each best-fit model determined from Jeans models. All error bars are determined by propogating the uncertainty in the color-M/Lrelation through the dynamical models (e.g., the blue and yellow lines in Figure 7 and Figure 13). Figure 12. 1D V rms vs. JAM prediction of BH mass models. All models are fixed to the best-fit anisotropy parameter (b = 0.05 z ), mass scaling factor (g = 0.890), and inclination angle (i = 20°). The data are binned radially. The models show the JAM model predictions at a range of BH masses; the data clearly favor a low BH mass 10 5  M . enhanced in the center. S10 found that the cluster counterrotates between this inner portion and the outer portion of the NSC; the new results suggest that the counter-rotating component may be a 1 Gyr old addition to the preexisting NSC.

Conclusions
We present a new analysis of the nucleus of NGC 404 with data from HST/WFC3 and STIS. We use these data to analyze the stellar populations of the nucleus and combine this with kinematic data from Gemini/NIFS to constrain the mass of the BH and NSC at the center of NGC 404.
1. We develop a method to incorporate variations in the stellar M/L into the dynamical modeling to constrain the BH mass in NGC 404. Specifically, we use spectroscopically determined M/L spatial variations of the nuclear region and use these to create M/L versus color relations appropriate to the local stellar populations present in the nucleus. We then use these relations to create a mass map of the nucleus. The spatial variabilities in M/L are thus directly incorporated into our dynamical model. Incorporating stellar population-based models is critical for getting good dynamical constraints on the lowest mass BHs, as most of these are located in NSCs with complicated and spatially varying stellar populations. 2. Our derived color-M/L relation is inconsistent with previously published relations. It is steeper and much lighter than the relation from Bell et al. (2003), while it is shallower than the color-M/L relations of Roediger & Figure 14. Comparison of the velocity fields of our best-fit kinematic center model (left) with the Gemini/NIFS H 2 velocity field (center) and velocity residuals (right). The black contours denote the dust-emission-corrected K-band image of the NSC from S10. White contours show the intensity of the H 2 gas. Despite being optimized for fitting the central 12×12 pixels, the worst residuals are found at the very center of the cluster. Figure 15. The best fitM/Land BH mass for the kinematic center and the photometric center. M/Lis expressed as the ratio of the dynamical mass and the the stellarM/Lfound by fitting SSP models to the STIS data (the same γ parameter fit using the JAM models). Red dots show our model grid. c 2 at each grid point is optimized for the inclination. Contours show the 1σ (thick line) and 3σ model space allowed by the data.  Thornton et al. 2008) are shown with specific name tag.