Cara menggunakan python netcdf4 groups

Network common data form (NetCDF) is commonly used to store multidimensional geographic data. Some examples of these data are temperature, precipitation, and wind speed. Variables stored in NetCDF are often measured multiple times per day over large (continental) areas. With multiple measurements per day, data values accumulate quickly and become unwieldy to work with. When each value is also assigned to a geographic location, data management is further complicated. NetCDF provides a solution for these challenges. This article will get you started with reading data from NetCDF files using Python.

Installation

NetCDF files can be read with a few different Python modules. The most popular are

print(ds)
2 and
print(ds)
3. For this article we’ll focus strictly on
print(ds)
2 as it is my personal preference.

For information on how to read and plot NetCDF data in Python with xarray and rioxarray check out this article.

Installation is simple. I generally recommend using the anaconda Python distribution to eliminate the confusion that can come with dependencies and versioning. To install with anaconda (conda) simply type

print(ds)
5. Alternatively, you can install with
print(ds)
6.

To be sure your

print(ds)
2 module is properly installed start an interactive session in the terminal (type
print(ds)
8 and press ‘Enter’). Then
print(ds)
9.

Loading a NetCDF Dataset

Loading a dataset is simple, just pass a NetCDF file path to

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
0. For this article, I’m using a file containing climate data from Daymet.

import netCDF4 as ncfn = '/path/to/file.nc4'
ds = nc.Dataset(fn)

General File Structure

A NetCDF file has three basic parts: metadata, dimensions and variables. Variables contain both metadata and data.

print(ds)
2 allows us to access the metadata and data associated with a NetCDF file.

Access Metadata

Printing the dataset,

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
2, gives us information about the variables contained in the file and their dimensions.

print(ds)

And the output . . .

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:

Above you can see information for the file format, data source, data version, citation, dimensions, and variables. The variables we’re interested in are

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
3,
<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
4,
<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
5, and
<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
6 (precipitation). With these variables we can find the precipitation at a given location for a given time. This file only contains one time step (time dimension is 1).

Metadata can also be accessed as a Python dictionary, which (in my opinion) is more useful.

print(ds.__dict__)OrderedDict([('start_year', 1980), ('source', 'Daymet Software Version 3.0'), ('Version_software', 'Daymet Software Version 3.0'), ('Version_data', 'Daymet Data Version 3.0'), ('Conventions', 'CF-1.6'), ('citation', 'Please see http://daymet.ornl.gov/ for current Daymet data citation information'), ('references', 'Please see http://daymet.ornl.gov/ for current information on Daymet references')])

Then any metadata item can be accessed with its key. For example:

print(ds.__dict__['start_year']1980

Dimensions

Access to dimensions is similar to file metadata. Each dimension is stored as a dimension class which contains pertinent information. Metadata for all dimensions can be access by looping through all available dimensions, like so.

for dim in ds.dimensions.values():
print(dim)
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 1<class 'netCDF4._netCDF4.Dimension'>: name = 'nv', size = 2<class 'netCDF4._netCDF4.Dimension'>: name = 'y', size = 8075<class 'netCDF4._netCDF4.Dimension'>: name = 'x', size = 7814

Individual dimensions are accessed like so:

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
7.

Variable Metadata

Access variable metadata in the same manner as dimensions. The code below shows how this is done. I’ve forgone the output because it is quite lengthy.

for var in ds.variables.values():
print(var)

The procedure to access information for a specific variable is demonstrated below for

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
6 (precipitation).

print(ds['prcp'])

And the output . . .

<class 'netCDF4._netCDF4.Variable'>float32 prcp(time, y, x)_FillValue: -9999.0coordinates: lat longrid_mapping: lambert_conformal_conicmissing_value: -9999.0cell_methods: area: mean time: sum within days time: sum over daysunits: mmlong_name: annual total precipitationunlimited dimensions: timecurrent shape = (1, 8075, 7814)

Access Data Values

The actual precipitation data values are accessed by array indexing, and a

<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4_CLASSIC data model, file format HDF5):start_year: 1980source: Daymet Software Version 3.0Version_software: Daymet Software Version 3.0Version_data: Daymet Data Version 3.0Conventions: CF-1.6citation: Please see http://daymet.ornl.gov/ for current Daymet data citation informationreferences: Please see http://daymet.ornl.gov/ for current information on Daymet referencesdimensions(sizes): time(1), nv(2), y(8075), x(7814)variables(dimensions): float32 time_bnds(time,nv), int16 lambert_conformal_conic(), float32 lat(y,x), float32 lon(y,x), float32 prcp(time,y,x), float32 time(time), float32 x(x), float32 y(y)groups:
9 array is returned. All variable data is returned as follows:

prcp = ds['prcp'][:]

Or a subset can be returned. The following code returns a 2D subset.

print(ds)
0

Here’s the 2D subset result.

print(ds)
1

Conclusion

NetCDF files are commonly used for geographic time-series data. Initially, they can be a bit intimidating to work with because of the large amounts of data contained, and the different format from the csv and raster files that are most commonly used. NetCDF is a great way to document geographic data because of the built in documentation and metadata. This makes it easy for end users to understand exactly what the data represent with little ambiguity. NetCDF data are accessed as numpy arrays, which present many possibilities for analysis and incorporation to existing tools and workflows.