Note: This piece is a quick view for data analysis,my experience of data analysis, and summary of R packages for water analysis. It might not cover everything, if you have more questions, other interests, or any suggestion, I welcome and appreciate.

What is data analysis?

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. - Wikipedia

Data analysis is like cooking:

  • knowing what you want to cook (goal or final product)
  • deciding recipe (math, model, and algorithms)
  • preparing ingredients (raw data) and tools (package or library for cleaning data and processing data)
  • cooking (running the code)
  • seasoning (writing comment or note)
  • garnishing (data vidulization)
  • reflecting - is the meal/dish what we want? (evaluation or validation)

The figure of data science process from Wikipedia

The most frequent “question” and “error” in programming

Make sure your environent
  • Check your path getwd()
  • Make sure your global environment is empty rm(list= ls()) or the exist variables are not what you’re going to use ls()
Check your data
  • Check your data type (character, numeric, integer, logical, complex) and structure (atomic vector, list, matrix, data frame, factors) typeof(), class(), str()
  • Check your data dimensions dim(), nrow(), ncol(), length()
  • Check your data head(), tail(), summary()
  • Make sure the month and day are correct after it is read
  • Make sure you know if you have any missing data anyNA(), is.na(), summary()

An example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

## Example of data type matter
library(dataRetrieval) # load USGS library

# download successfully
siteNumber <- 16240500
startDate <- "2010-01-01"
endDate <- "2018-12-31"
readNWISpeak(siteNumber, startDate, endDate) # Get peakflow data

# download successfully
siteNumber <- 16240500
startDate <- "01/01/2010"
endDate <- "12/31/2018"
readNWISpeak(siteNumber, startDate, endDate)

# download with wrong day and month
startDate <- "18/12/2010"
endDate <- "12/31/2018"
readNWISpeak(siteNumber, startDate, endDate)

# download unsuccessfully
siteNumber <- 16240500
startDate <- as.factor("2010/01/01")
endDate <- as.factor("2018/12/31")
readNWISpeak(siteNumber, startDate, endDate)

Import data and input data
  • Make sure your file format, ex: .txt, .csv, .xlsx, .html, etc.
  • read.table() and read.csv(), read.delim()
  • Import excel data install.packages("XLConnect"), or install.packages("xlsx"). Note: Install Java before head. XLConnect is a Java-based solution, so it is cross platform and returns satisfactory results. For large data sets it may be very slow.
  • Import JSON data install.packages("rjson")
  • Import XML data install.packages("XML")
  • Read SPSS, Stata, Systat, Minitab file install.packages("foreign")
  • Read SAS file install.packages("sas7bdat")
  • … (A lot of packages out there for different data format!)
  • Make sure your input to a function is as the same format the function needs.
  • Use strsplit() to split strings

An example for reading excel file:

1
2
3
4
5
6
7
8
library(xlsx)
# Read excel data with name of sheet
Action_cam <- read.xlsx("Pearl Harbor Data.xlsx", sheetName = "Action Cam")
str(Action_cam)
summary(Action_cam)

# read excel data with the number of sheet
Action_cam2 <- read.xlsx("excel_data.xlsx", sheetName = 6)

An example for reading a text file with multiple commented lines:

1
2
3
4
5
6
7
8
# Mannual delete the commented line (remember to have a copy of original file)
data_mannual <- read.table("USGS_Waihi_WQ_plots_modified.txt", sep = "\t", header = T)

# Remove the commented line in R (not changing the original file)
data_first <- read.table("USGS_Waihi_WQ_plots.txt", header =F, sep ='\t', comment.char = '#', stringsAsFactors = F)
data_first <- data_first[-grep("#", data_first$V1),]
colnames(data_first) <- as.character(data_first[1,])
data_first <- data_first[-c(1,2),]

Where to find help for R? Or learn more?

Packages

Some usefull packages

  • devtools: get package from github
  • reshape2: reshape data
  • zoo: package for time series
  • xts: package for time series
  • dataRetrieval: download USGS gage data
  • markwh/streamstats: API for USGS StreamStats
  • sp: spatial ploting tool
  • rgdal: pakcage for raster data
  • ggplot2: package for plotting nice figures

An example for streamstats:

1
2
3
4
5
6
7
8
9
10
11
12
13
#devtools::install_github("markwh/streamstats") #An R package for using the USGS Streamstats API
library(streamstats)

#Get basin characteristics and download the watershed shapefiles ---------------
singleInfo <-
delineateWatershed(-156.679722, 20.946, rcode = "HI",
includeparameters = "true", includefeatures = "true", crs = 4326)
singleInfo.unlist <- data.frame(matrix(unlist(singleInfo[[3]]), nrow = 57, byrow = F))
singleInfo.unlist <- singleInfo.unlist[,c(-1,-2)]
colnames(singleInfo.unlist) <- c("Description","Par","Unit","Value")
write.csv(singleInfo.unlist, "delineation_par_test.csv") # Write all the parameter to a file

downloadGIS(singleInfo$workspaceID, "delineation_test.zip",format = "shapefile") # Sownload the delineated watershed

An exmaple for ggplot2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# install.package("ggplot2")
library(ggplot2)

# Read excel data with name of sheet
library(xlsx)
Action_cam <- read.xlsx("excel_data.xlsx", sheetName = "Action Cam")
# ggplot of how many time the camera saw each genus
ggplot(data = Action_cam) +
geom_bar(aes(x = Genus))


## What is the count for each genus?
Genus <- levels(Action_cam$Genus)
sum(Action_cam$Count, na.rm = T)

#
genus_dataframe <- data.frame(Genus = as.factor(Genus), Count = NA) #
for(i in 1:length(Genus)){
genus_dataframe$Count[i] <- sum(Action_cam$Count[Action_cam$Genus %in% Genus[i]], na.rm = T)
}
# ggplot
ggplot(data = genus_dataframe) +
geom_bar(aes(x=Genus, y = Count, fill = Genus), stat = "identity")

Useless but fun

1
2
3
4
#install.packages("fortunes") # if you don't already have it
library(fortunes)
fortune()
fortune("memory")
1
2
3
#install.packages("cowsay")
library(cowsay)
say("Meow! Sis! I love R and \"debug\", you know?", by = "bigcat")

Collection of R packages for water

(From: https://github.com/ropensci/hydrology)

Data Retrieval

Hydrological data sources (surface water/groundwater quantity and quality)

  • dataRetrieval: Collection of functions to help retrieve U.S. Geological Survey (USGS) and U.S. Environmental Protection Agency (EPA) water quality and hydrology data from web services.

  • dbhydroR: Client for programmatic access to the South Florida Water Management District’s DBHYDRO database, with functions for accessing hydrologic and water quality data.

  • hddtools: Hydrological Data Discovery Tools. Facilitates discovery and handling of hydrological data, access to catalogues and databases.

  • hydrolinks: Tools to link geographic data with hydrologic network, including lakes, streams and rivers. Includes automated download of U.S. National Hydrography Network and other hydrolayers.

  • hydroscoper: R interface to the Greek National Data Bank for Hydrological and Meteorological Information. It covers Hydroscope’s data sources and provides functions to transliterate, translate and download them into tidy dataframes (tibbles).

  • kiwisR: Wrapper for retrieving data from KISTERS WISKI databases via the KiWIS API. GitHub only package.

  • rnrfa: Utility functions to retrieve data from the UK National River Flow Archive. There are functions to retrieve stations falling in a bounding box, to generate a map and extracting time series and general information.

  • sbtools: Tools for interacting with U.S. Geological Survey ScienceBase data cataloging and collaborative data management platform. Functions included for querying ScienceBase, and creating and fetching datasets.

  • tidyhydat: Provides functions to access historical and real-time national ‘hydrometric’ data from Water Survey of Canada data sources ( http://dd.weather.gc.ca/hydrometric/csv/ and http://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/) and then applies tidy data principles.

  • washdata: Urban water and sanitation survey dataset from survey conducted in Dhaka, Bangladesh, part of a series of surveys to be conducted in various cities including Accra, Ghana; Nakuru, Kenya; Antananarivo, Madagascar; Maputo, Mozambique; and, Lusaka, Zambia.

  • waterData: Imports U.S. Geological Survey (USGS) daily hydrologic data from USGS web services, plots the data, addresses some common data problems, and calculates and plots anomalies.

  • WaterML: Lets you connect to any of the Consortium of Universities for the Advancement of Hydrologic Sciences, Inc. (‘CUAHSI’) Water Data Center ‘WaterOneFlow’ web services and read any ‘WaterML’ hydrological time series data file.

Meteorological data (precipitation, radiation, temperature, etc - including both measurements and reanalysis)

  • bomrang: Provides functions to interface with Australian Government Bureau of Meteorology (BOM) data, fetching data and returning a tidy data frame of précis forecasts, historical and current weather data from stations, agriculture bulletin data, BOM 0900 or 1500 weather bulletins or a raster stack object of satellite imagery from GeoTIFF files.

  • countyweather: Interacts with NOAA data sources (including the NCDC API and ISD data) using functions from the ‘rnoaa’ package to obtain and compile weather time series for U.S. counties.

  • getMet: Functions for sourcing, formatting, and editing meteorological data for hydrologic models.

  • GSODR: Provides automated downloading, parsing, cleaning, unit conversion and formatting of Global Surface Summary of the Day (GSOD) weather data from the from the USA National Centers for Environmental Information (NCEI) for use in R.

  • MODISTools: Programmatic Interface to the MODIS Land Products Subsets Web Services. Allows for easy downloads of ‘MODIS’ time series.

  • rdwd: Handle climate data from the German DWD (‘Deutscher Wetterdienst’).

  • RNCEP: Contains functions to retrieve, organize, and visualize weather data from the NCEP/NCAR Reanalysis and NCEP/DOE Reanalysis II datasets.

  • rnoaa: Client for many NOAA data sources including the NCDC climate API, with functions for each of the API endpoints: data, data categories, data sets, data types, locations, location categories, and stations. Includes interface NOAA sea ice data, severe weather inventory, Historical Observing Metadata Repository (‘HOMR’), storm data via ‘IBTrACS’, tornado data via the NOAA storm prediction center, and more.

  • rpdo: Get Monthly Pacific Decadal Oscillation (PDO) index values from January 1900 to present. See also rsoi for downloading Southern Oscillation Index, Oceanic Nino Index and North Pacific Gyre Oscillation data.

  • rwunderground: Tools for getting historical weather information and forecasts from wunderground.com. Historical weather and forecast data includes, but is not limited to, temperature, humidity, windchill, wind speed, dew point, heat index. Additionally, the weather underground weather API also includes information on sunrise/sunset, tidal conditions, satellite/webcam imagery, weather alerts, hurricane alerts and historical high/low temperatures.

  • smapr: Acquisition and Processing of NASA Soil Moisture Active-Passive (SMAP) Data. Facilitates programmatic access to search for, acquire, and extract NASA Soil Moisture Active Passive (SMAP) data.

  • weathercan: Provides means for downloading historical weather data from the Environment and Climate Change Canada website. Data can be downloaded from multiple stations and over large date ranges and automatically processed into a single dataset. Tools are also provided to identify stations either by name or proximity to a location.

  • worldmet: Functions to import data from more than 30,000 surface meteorological sites around the world managed by the National Oceanic and Atmospheric Administration (NOAA) Integrated Surface Database (ISD).

Data Analysis

Data tidying (gap-filling, data organization, QA/QC, etc)

  • driftR: A tidy implementation of equations that correct for instrumental drift in continuous water quality monitoring data using one or two standard reference values. The equations implemented are from Hasenmueller (2011).

  • fasstr Functions to tidy, summarize, analyze, trend, and visualize streamflow data. This package summarizes continuous daily mean streamflow data into various daily, monthly, annual, and long-term statistics, completes annual trends and frequency analyses, in both table and plot formats. GitHub only package.

  • climdex.pcic: PCIC Implementation of Climdex Routines PCIC’s implementation of Climdex routines for computation of extreme climate indices.

  • climatol: Functions for the quality control, homogenization and missing data infilling of climatological series and to obtain climatological summaries and grids from the results. Also functions to draw wind-roses and Walter\&Lieth climate diagrams.

  • getMet: Functions for sourcing, formatting, and editing meteorological data for hydrologic models.

Hydrograph analysis (functions for working with streamflow data, e.g. flow statistics, trends, biological indices, etc.)

  • biotic: Calculates a range of UK freshwater invertebrate biotic indices including BMWP, Whalley, WHPT, Habitat-specific BMWP, AWIC, LIFE and PSI.

  • EcoHydRology: This package provides a flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex eco-hydrological interactions, including some SWAT calibration functions.

  • ecoval: Functions for evaluating and visualizing ecological assessment procedures for surface waters containing physical, chemical and biological assessments in the form of value functions.

  • EflowStats: Calculates a suite of ecological flow statistics and fundamental properties of daily streamflow for a given set of data. GitHub only package.

  • EGRET: Exploration and Graphics for RivEr Trends (EGRET): analysis of long-term changes in water quality and streamflow, including the water-quality method Weighted Regressions on Time, Discharge, and Season (WRTDS).

  • EGRETci: A bootstrap method for estimating uncertainty of water quality trends.

  • FAdist: Probability distributions that are sometimes useful in hydrology.

  • FlowScreen: Screens daily streamflow time series for temporal trends and change-points. This package has been primarily developed for assessing the quality of daily streamflow time series. It also contains tools for plotting and calculating many different streamflow metrics.

  • hydrostats: Calculates a suite of hydrologic indices for daily time series data that are widely used in hydrology and stream ecology.

  • hydroTSM: Functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks.

  • lfstat: Functions to compute and plot statistics described in the “Manual on Low-flow Estimation and Prediction”, published by the World Meteorological Organisation (WMO).

Meteorology (functions for working with meteorological and climate data)

  • Evapotranspiration: Functions to calculate potential evapotranspiration (PET) and actual evapotranspiration (AET) from 21 different formulations including Penman, Penman-Monteith FAO 56, Priestley-Taylor and Morton models.

  • humidity: Functions for calculating saturation vapor pressure (hPa), partial water vapor pressure (Pa), relative humidity (%), absolute humidity (kg/m^3), specific humidity (kg/kg), and mixing ratio (kg/kg) from temperature (K) and dew point (K). Conversion functions between humidity measures are also provided.

  • MBC: Multivariate Bias Correction of Climate Model Outputs. Calibrate and apply multivariate bias correction algorithms for climate model simulations of multiple climate variables.

  • meteoland: Functions to estimate weather variables at any position of a landscape.

  • musica: Multiscale Climate Model Assessment. Provides function to compare and analyse time series.

  • qmap: Empirical adjustment of the distribution of variables originating from (regional) climate model simulations using quantile mapping.

  • Rainmaker: Instantaneous rainfall data processing for defining event periods, determination of antecedent rainfall conditions and X-hr intensities. GitHub only package.

Other

  • berryFunctions: Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.

  • GWSDAT: Shiny application for the analysis of groundwater monitoring data, designed to work with simple time-series data for solute concentration and ground water elevation, but can also plot non-aqueous phase liquid (NAPL) thickness if required.

  • hydrogeo: Contains one function for drawing Piper diagrams (also called Piper-Hill diagrams) of water analyses for major ions.

  • kitagawa: Provides tools to calculate the theoretical hydrodynamic response of an aquifer undergoing harmonic straining or pressurization. There are two classes of models here: (1) for sealed wells, based on the model of Kitagawa et al (2011), and (2) for open wells, based on the models of Cooper et al (1965), Hsieh et al (1987), Rojstaczer (1988), and Liu et al (1989).

  • MBSStools: Suite of tools for data manipulation and calculations for Maryland DNR MBSS program. GitHub only package.

  • MODIStsp: Suite of tools to automate the Download and Preprocessing of MODIS Land Products Data. Allows automating the creation of time series of rasters derived from MODIS Satellite Land Products data. It performs several typical preprocessing steps such as download, mosaicking, reprojection and resize of data acquired on a specified time period.

  • lulcc: Classes and methods for spatially explicit land use change modelling.

  • wql: Functions to assist in the processing and exploration of data from environmental monitoring programs. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter.

  • WRTDStidal: An adaptation for estuaries (tidal waters) of weighted regression on time, discharge, and season to evaluate trends in water quality time series.

Spatial data processing

The CRAN Spatial Task View gives an overview of packages to be used in R to read, visualise, and analyse spatial data. See also the ROpenSci MapTools Listing.

  • hydrolinks: Tools to link geographic data with hydrologic network, including lakes, streams and rivers. Includes automated download of U.S. National Hydrography Network and other hydrolayers.

  • lumpR: Functions for a semi-automated approach of the delineation and description of landscape units and partition into terrain components. It can be used for the pre-processing of semi-distributed large-scale hydrological and erosion models using catena-representation (WASA-SED, CATFLOW). GitHub only package.

  • lakemorpho: Lake morphometry metrics are used by limnologists to understand, among other things, the ecological processes in a lake. The ‘lakemorpho’ package provides the tools to calculate a typical suite of these metrics from an input elevation model and lake polygon.

  • Watersheds: Methods for watersheds aggregation and spatial drainage network analysis.

Modeling

Process-based modeling (scripts for preparing inputs/outputs and running process-based models)

See also the RHydro project on R-forge.

  • airGR: Hydrological modelling tools developed at Irstea-Antony (HYCAR Research Unit, France). The package includes several conceptual rainfall-runoff models (GR4H, GR4J, GR5J, GR6J, GR2M, GR1A), a snow accumulation and melt model (CemaNeige) and the associated functions for their calibration and evaluation.

  • airGRteaching: Add-on package to the ‘airGR’ package that simplifies its use and is aimed at being used for teaching hydrology.

  • bigleaf: Calculation of physical (e.g. aerodynamic conductance, surface temperature), and physiological (e.g. canopy conductance, water-use efficiency) ecosystem properties from eddy covariance data and accompanying meteorological measurements. Calculations assume the land surface to behave like a ‘big-leaf’ and return bulk ecosystem/canopy variables.

  • boussinesq: Collection of functions for the One-Dimensional Boussinesq Equation (ground-water).

  • dynatopmodel: Implementation and enhancement of the Dynamic TOPMODEL semi-distributed hydrological model. Includes some preprocessing, utility and routines for displaying outputs. See also topmodel.

  • Ecohydmod: Simulates the soil water balance (soil moisture, evapotranspiration, leakage and runoff), rainfall series by using the marked Poisson process and the vegetation growth through the normalized difference vegetation index (NDVI). See Souza et al. (2016).

  • EcoHydRology: Flexible foundation for scientists, engineers, and policy makers to base teaching exercises as well as for more applied use to model complex eco-hydrological interactions, including some SWAT calibration functions.

  • geotopbricks: An R Plug-in for the Distributed Hydrological Model GEOtop. The package analyzes raster maps and other information as input/output files from the Hydrological Distributed Model GEOtop.

  • hydromad: Hydrological Model Assessment and Development - website. GitHub only package.

  • hydroPSO: Particle Swarm Optimisation (PSO) algorithm for the calibration of environmental and other real-world models that need to be executed from the system console. hydroPSO is model-independent, allowing the user to easily interface any computer simulation model with the PSO calibration engine.

  • kwb.hantush: Calculation groundwater mounding beneath an infiltration basin based on the Hantush (1967) equation. The correct implementation is shown with a verification example based on a USGS report (page 25).

  • loadflex: Models and Tools for Watershed Flux Estimates. See paper. GitHub only package.

  • reservoir: Tools for Analysis, Design, and Operation of Water Supply Storages. Measure single-storage water supply system performance using resilience, reliability, and vulnerability metrics; assess storage-yield-reliability relationships; determine no-fail storage with sequent peak analysis; optimize release decisions for water supply, hydropower, and multi-objective reservoirs using deterministic and stochastic dynamic programming; generate inflow replicates using parametric and non-parametric models; evaluate inflow persistence using the Hurst coefficient.

  • RHMS: Hydrologic modelling system is an object oriented tool which enables R users to simulate and analyze hydrologic events. The package proposes functions and methods for construction, simulation, visualization, and calibration of hydrologic systems.

  • RSAlgaeR: Builds Empirical Remote Sensing Models of Water Quality Variables and Analyzes Long-Term Trends. Assists in processing reflectance data, developing empirical models using stepwise regression and a generalized linear modeling approach, cross- validation, and analysis of trends in water quality conditions (specifically chl-a) and climate conditions using the Theil-Sen estimator.

  • streamDepletr: Package for assessing the impacts of groundwater pumping on streams. GitHub only package.

  • streamMetabolizer: Estimate aquatic photosynthesis and respiration (collectively, metabolism) from time series data on dissolved oxygen, water temperature, depth, and light via inverse modeling. The package assists with data preparation, handles data gaps during modeling, and provides tabular and graphical reports of model outputs. GitHub only package.

  • SWATmodel: The Soil and Water Assessment Tool (SWAT) is a river basin or watershed scale model developed by Dr. Jeff Arnold for the USDA-ARS.

  • topmodel: Set of hydrological functions including the hydrological model TOPMODEL, which is based on the 1995 FORTRAN version by Keith Beven. From version 0.7.0, the package is put into maintenance mode. See also dynatopmodel.

  • TUWmodel: Lumped Hydrological Model for Education Purposes: a lumped conceptual rainfall-runoff model, following the structure of the HBV model. The model runs on a daily or shorter time step and consists of a snow routine, a soil moisture routine and a flow routing routine.

  • wasim: Helpful tools for data processing and visualisation of results of the hydrological model WASIM-ETH.

  • water: Tools and functions to calculate actual Evapotranspiration using surface energy balance models.

  • WRSS: Water resources system simulator is a tool for simulation and analysis of large-scale water resources systems. ‘WRSS’ proposes functions and methods for construction, simulation and analysis of primary water resources features (e.g. reservoirs, aquifers, and etc.) based on Standard Operating Policy (SOP).

Statistical modeling (hydrology-related statistical models)

The Environmetrics: Task View gives an overview of packages used in the analysis of environmental data, encompassing hydrological data, including many statistical approaches used in the ecological sciences. Additionally, packages that help model datasets with extreme values are discussed in the ExtremeValue Task View.

  • CityWaterBalance: Retrieves data and estimates unmeasured flows of water through the urban network. Any city may be modeled with preassembled data, but data for US cities can be gathered via web services using this package and dependencies geoknife and dataRetrieval.

  • dream: DiffeRential Evolution Adaptive Metropolis (DREAM). Efficient global MCMC even in high-dimensional spaces. R-Forge only package.

  • fuse: An R package implementing the Framework for Understanding Structural Errors cvitolo.github.io/fuse/. GitHub only package.

  • hydroApps: Package providing tools for hydrological applications and models developed for regional analysis in Northwestern Italy focused on Flood Frequency Analysis.

  • hydroGOF: S3 functions implementing both statistical and graphical goodness-of-fit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological models.

  • HydroMe: Estimates the parameters in infiltration and water retention models by curve-fitting method. The models considered are those that are commonly used in soil science.

  • hyfo: Focuses on data processing and visualization in hydrology and climate forecasting. Main function includes data extraction, data downscaling, data resampling, gap filler of precipitation, bias correction of forecasting data, flexible time series plot, and spatial map generation. It is a good pre- processing and post-processing tool for hydrological and hydraulic modellers.

  • IDF: Functions to read precipitation data from German weather service (DWD) files and Berlin station data from and additionally Intensity-duration-frequency (IDF) parameters can be estimated from a given data.frame containing a precipitation time series. IDF parameters are estimated on the basis of a duration-dependent generalised extreme value distribution and IDF curves based on these estimated parameters can be plotted.

  • NEON-stream-discharge: NEON Stage-Discharge Rating Curve. Instructions to set up a docker container which calculates the stage-discharge rating curve for a site and water year, developed using a Bayesian modeling technique. GitHub only package.

  • LPM: Apply Univariate Long Memory Models, Apply Multivariate Short Memory Models To Hydrological Dataset, Estimate Intensity Duration Frequency curve to rainfall series.

  • meteo: Spatio-temporal geostatistical mapping of meteorological data.

  • nsRFA: A collection of statistical tools for objective (non-supervised) applications of the Regional Frequency Analysis methods in hydrology.

  • RMAWGEN: Functions for spatial multi-site stochastic generation of daily time series of temperature and precipitation.

  • rtop: Interpolation of Data with Variable Spatial Support Geostatistical interpolation of data with irregular spatial support such as runoff related data or data from administrative units.

  • SCI: Functions for generating Standardized Climate Indices (SCI). SCI is a transformation of (smoothed) climate (or environmental) time series that removes seasonality and forces the data to take values of the standard normal distribution. SCI was originally developed for precipitation. In this case it is known as the Standardized Precipitation Index (SPI).

  • soilwater: Implements parametric formulas of soil water retention or conductivity curve. At the moment, only Van Genuchten (for soil water retention curve) and Mualem (for hydraulic conductivity) were implemented.

  • SPEI: A set of functions for computing potential evapotranspiration and several widely used drought indices including the Standardized Precipitation-Evapotranspiration Index (SPEI).

  • swmmr: Functions to connect the widely used Storm Water Management Model (SWMM) of the United States Environmental Protection Agency (US EPA) to R.

Packages in Python for hydrology

(Please see: http://abouthydrology.blogspot.com/2016/11/python-resources-for-hydrologists.html)

One of the staff plate I installed with my colleague in Lyon Arboretum, Honolulu, Hawaii, USA

Why do we need a staff plate?

We need staff plate for knowing the stage (water height) while we’re measuring streamflow to build rating curve - a relationship between discharge and stage that allows us to know discharge by only stage.

Why do we need to have a staff plate if we have other continuous stage sensors (i.e. pressure transducer) already?

A staff plate is a datum that it won’t shift, so it’s a solid reference for the streamflow. Also, it helps us to certain our measurement while we need to switch or renew the sensor.

So… How to build a staff plate? Below is my experience in Hawaii, it might need to be adjusted in other places.

Let’s see what we need first! We will need (The original list is from Dr. Ayron Strauch, a hydrologist in CWRM):

Kodama and Barnes (1997) had a great literature review in flash flood and the flash flood in Hawaii from meteorological perspectives. They conducted their research on southeast Mauna Loa from 1978 through 1992. By looking into synoptic influences, rainfall patterns, stability indices, precipitable water, upslope component of the surface winds, midlevel moisture, they found:

  1. rain mainly occur over the volcano slopes facing the low-level flow;
  2. the heaviest rainfall usually occurs upslope above the 0.5-km-elevation level;
  3. midlevel moisture and K index play good roles

I summarize their literature review and their conclusions in the table below:

(Haraguchi, 1977; Schroeder, 1977, 1978a, 1981 for Hawaii; Maddox et al., 1979) Hawaii (Kodama and Barnes 1997)
[BOTH] convective clouds
[CONUS] high surface dewpoint temperature Surface dewpoints are always high around Hawaii. The relatively warm ocean surrounding the Hawaiian Islands keeps surface dewpoints between 16C and 21C most of the year
[CONUS] high moisture content through a deep tropospheric layer [gap or no observations] The Hawaiian studies have not addressed tropospheric moisture content
[CONUS] weak to moderate vertical shear of the horizontal wind through cloud depth [gap no observations] rarely studied. Dracup et al. (1991) study of the 1987 New Year’s Eve flood on Oahu indicated substantial vertical wind shear
[CONUS] repeated formation and similar movement of the convective clouds Quasi-stationary storm system are evident in Hawaiian flash floods. Rain gauge records from several events such as the December 1991 flood at Anahola, Kauai (DLNR 1992), and the April 1974 flood on Oahu (Schroeder 1977) show that high-intensity rain occurred on large temporal scales than those identified with individual convective cells. Schroeder (1978a) suggested that Hawaiian heavy rain systems were “anchored” to the mountainous terrain of the islands.
[CONUS] proximity to a large-scale 500-mb ridge position [gap no observations] The Hawaiian flash flood studies have not addressed the position of storms relative to a synoptic-scale ridge.
[CONUS] a weak midlevel shortwave through moving through or around the ridge [gap no observations] Hawaiian heavy rain studies have not addressed the influence of midtropospheric shortwaves, but there is evidence that the events are linked to one of four synoptic situations that include cold fronts, the Kona storms, upper-tropospheric disturbances, and tropical systems (Blumenstock and Price 1967).
[Both] a nocturnal rainfall maximum (Haraguchi 1977; Maddox et al., 1979) There appears to be a nighttime preference for Hawaiian flash floods
[Hawaii] The heaviest rain usually occured on the slopes facing the prevailing low-level winds
[Hawaii] thunderstorms were associated with many of these events
  • Thunderstorms are not a requirement for Hawaiian flash floods (Schroeder 1978b; Dracup et al. 1991; Cram and Tatum 1979)
  • Hawaiian precipitation systems to produce instantaneous rain rates greater than 250 mm/h with cloud top below the freezing level (Fullerton and Wilson 1975)
[Hawaii] an upper-tropospheric trough axis existed west of the state
[Hawaii] the direction of low-level winds determined storm rainfall distribution

Citation:
Kodama, K., & Barnes, G. M. (1997). Heavy Rain Events over the South-Facing Slopes of Hawaii: Attendant Conditions. Weather and Forecasting, 12, 347–367. https://doi.org/10.1175/1520-0434(1997)012<0347:HREOTS>2.0.CO;2

CONFERENCE/MEETING

Poster, “Temporal Shifts in the Magnitude of Peak Streamflow and its Associated Rainfall across the Hawaiian Islands” American Geophysical Union 2018 Fall Meeting, Washington, DC. USA, 2018
Oral presentation, “Temporal Shifts in Peak Flow Magnitudes Across the Island of Maui, Hawai??i.” Asia Oceania Geoscience Society (AOGS) 15th Annual Meeting, Honolulu, HI, USA,, 2018
Poster, “Apply Spatially Distributed Rainfall Data to a Hydrological Model in a Tropical Watershed in Hawaii.” American Geophysical Union 2017 Fall Meeting, New Orleans, LA, USA, 2017
Poster, “Summer Leeside Rainfall Maxima over the Island of Hawaii.” American Geophysical Union 2016 Fall Meeting, San Francisco, CA, USA, 2016
Oral presentation, “An evaluation of the stability indices for local convective systems in Taiwan.” East-West Center International Graduate Student Conference, Honolulu, HI, USA, 2016

Before the performance, kumu hula Noe told us:

Hula is a package deal, you need to get your timing, follow the rhythm, make sure the right gesture, and be grateful and strong at the same time. Also, you need to show your personality! If you just dance the dance, it’s not hula, student.

I’ve been dancing hula in UH with kumu Noe for two years. Kumu is strict at the same time she try her best to respect her student, other halau, and other culture. I like her always insisting something about hula. She has her boundary of her hula. How about me? What is my boundary of what I love?

I didn’t eat a lot in this Thanksgiving. Instead, I was camping with friends and the nature. It was great and made me feel even more grateful. Julie asked me if I wanna camp a week ago, and I said yes. Thanks Julie for calling and applying for the permit ($10 per night per campsite).

The day started from waiting for Hal turning in her job application of FAO in Rome. We packed and drove for about an hour from Manoa to Kahana Bay. After took a break and chill a bit, we decided to do the Nakoa trail hike in the Kahana Valley.

In the Ahupuaa Kahana State Park, We passed through some families (no photo because I don’t want a stranger take photos without my permission of my living area as well.), they have an interesting relationship with this state park. In 1965, the state turned Kahana into a public park. The state, Kahana residents and the community developed the concept of a “living park” that allowed residents to remain in the ahupuaa in exchange for interpretive services for park visitors. In 1988, the state Legislature authorized DLNR to issue leases to 31 families in Kahana. All residents were required to move to the rear of the park, opening the park entrance to the public. Yet the Legislature rejected new leases in the later years. I’m not sure how State or the residents there think of their role. For me, I didn’t really feel the “cultural” part while I passed through and they are more just a residential area for me. At the same time, CWRM has a project of restoration for the Kahana stream, they’re cutting off the mangroves, Hau, to see if it can help the stream ecosystem upstream.

Nakoa trail is about 2 to 3 hours round hike, simple, not steep and full of shade. The dark side of this hike is muddy, mosquito, and that we need to across the stream two times. If you go further from the second point of the hike, you might reach a swimming pool with a swing to jump into the water (we didn’t go further because people were tired of mosquito…)

The map info of Nakoa trail.
A lookout point on the way.

After hiking, Hal wanted to take a look the fishpond, so we walked along the beach and then went into the water toward the fishpond. It is amazing that we can just walked through with the water only to our weigh and how far the sediment can go from the stream. The sediment became finer and the water became deeper as we went further. Huilua Fishpond is one of the six fishponds that still exist out of 97 fishponds in Oahu. It allows small fish to swimming in and out but keeps bigger fishes in the pond.

Looking at Huilua Fishpond from Kahana Bay

After all the adventure, time to enjoy grill and the simple of Kahana Bay. :)

Looking back Kahana Valley from the ocean.
In the end of left side Kahana (toward ocean), you can pick up parts of your childhood there!

It’s interesting to see that my last nonsense is just a month ago… :P

After class, I works on my class peer-review. And realized how my writing can kill my advisor earlier because I’m getting the karma back to me now… lol. One can really tell if the writing is good or not by how clear throughout the context; if it presents main ideas; consistency, and the choice of the words.

Meet an old friend today. I was surprised that he contacted me. He became choppier and looked wiser. He is here for research project and visiting the other friend. We mentioned about weather can shape people’s personalities/characters and impact people’s mental a lot. I guess so.

Then I finally went to the public library after 5 years in Hawaii. It’s really nice there, and I can borrow books from any of public library in Oahu and return to any of them!! Fantastic! I’ve also heard I can rent amazon e-book from the library; I need to discover how it works later on.

I started my poster for AGU, and hopefully I can finish the draft by Thursday before the meeting with my advisor.

I wanted to some life project, any idea?

Circular statistics is a very interesting tool to visualize continuous data, such as direction, angles, or time. I’m now applying it to visualize the peakflow occurrence time in Hawaii. I found this book is very helpful: Circular statistics in R by Arthur Pewsey, Markus Neuhauser, and Graeme D. Ruxton. I practiced the functions in the R package, circular, by following this book and applied it to my own research. I’m showing some simple examples below, see if you’re interested in. :)

For sure, we need to install and load this package, circular

1
2
install.packages("circular")
library(circular)

The first two practices from the book is like this:

1
2
3
4
5
6
7
8
9
10
windc <- circular(wind, type = "angles", units = "radians", template = "geographics",
rotation = "clock", zero = pi/2)
plot(wind, xlab = "Observation number", ylab = "Wind direction (in radians)")
plot(windc, cex = 1, bins = 720, stack = TRUE, sep = 0.035, shrink = 1.3)
axis.circular(at = circular(seq(0,7*pi/4, pi/4), rotation = "clock", zero = pi/2), labels = c("N","NE","E","SE","S","SW","W","Nw"),
zero = pi/2, rotation = 'counter', cex = 1.1)
ticks.circular(circular(seq(0.2*pi, pi/8)), zero = pi/2, rotation = "clock", tcl = 0.075)

## Rose Diagram (histogram for circular)
rose.diag(windc, bins = 16, cex = 1.5, prop = 1.3, shrink = 1.3, add = T, axes = F)

Here is how I use for all the peakflow in Hawaii:

1
2
3
4
5
6
7
8
plot(allGage.circular, cex = 0.5, bins = 720, stack = TRUE, sep = 0.035, shrink = 1.8, axes = F,
col = "grey50")
axis.circular(at = circular(seq(0,330,30), rotation = "clock", zero = pi/2, units = "degrees"), labels = month.ab,
zero = pi/2, units = "degrees", rotation = 'counter', cex = 1)
ticks.circular(circular(seq(0,330,30), rotation = "clock", zero = pi/2, units = "degrees"), zero = pi/2, rotation = "clock", tcl = 0.075)
lines(density.circular(na.omit(allGage.circular), bw = 10), lwd = 2, lty = 2, col = "red")
legend("bottomleft", legend = c("Peakflow Date for all gages", "Kernal Density Estimates"), col = c("grey50", "red"),
lwd = c(NA,2), lty = c(NA,2), pch = c(16,NA))

When does peakflow occur in Hawaii?

It also can be use to comparing different sources or different time block, for instance, I compared all the gages by island and by time. I only have figures here since the original code for these including other calculation with loops, and it’s too much to show here.

Peakflow occurrence time by island:
Peakflow occurrence time by island

Peakflow occurrence time by years:
Peakflow occurrence time by every 9 years

Feel free to leave your comments or questions. Here is the amazon link to the book if you need:

I appreciate all the efforts that has been done for the previous research. Some papers provide us knowledge; some papers correct our imagination of the world; some papers gave a clear picture of how the world processing, and some papers inspire and encourage us. Ashish’s paper is one of the inspiration, at least to me.

I met Dr. Ashish Sharma in the Asia Oceania Geosciences Society conference this year. I accidently went to his talk, and was impressed by how clear he explain a concept. Also I accidently presented my peakflow research in the same panel as him (he was the invited speaker for our panel). He presented Wasko’s paper (2017) that time, it was very interesting. Although I follow up after the conference in few weeks, yet too much other work buried me till now while I’m doing literature review for my Ph.D. dissertation proposal and the peakflow manuscript.

Too much intro, let’s take a look what’s inside the paper by Sharma et al. (2018):

First, they gave very nice and clear literature review with one or two sentence summary for each paper on how extreme precipitation has been changed over time, and how extreme streamflow didn’t change in the same ways. They pointed out the gap of our understanding and knowledge between rainfall and peakflow.

Second, they destructed the mechanisms of flood to approach their question “Why aren’t floods?” with a lot of literature supported. Besides, they expected some changes.

Third, they recommend more efforts on understanding the relationship between extreme rainfall and streamflow, including 1) changes to antecedent hydrologic conditions and their impact on flood response; 2) changes in the proportion and persistence of storms arising from different causative mechanisms, such as an increased proportion and frequency of convective extremes; 3) Interaction among catchment size and geometry and changing storm characteristics including extent, intensity, and duration; 4) snow-cover and snow volume changes and their changing contributions to flood extremes in a warmer climate; 5) the role of land cover change (especially, but not only, urbanization) and the interaction of land cover change with climatic factor.

I was inspired because I kept comparing my peakflow research to the context, kept asking why this and why that, and noticed the differences between us. Yet, some factors are definitely worthy to look into.

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×