Module netCDF4
[hide private]
[frames] | no frames]

Module netCDF4



Introduction

Python interface to the netCDF version 4 library. netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5. This module can read files created with netCDF versions 2 and 3, but writes files which are only compatible with netCDF version 4. To create files which are compatible with netCDF 3 clients use the companion netCDF4_classic module. The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module.

Many new features of netCDF 4 are implemented, such as multiple unlimited dimensions, groups and zlib data compression. All the new primitive data types (such as 64 bit and unsigned integer types) are implemented, including variable-length strings (NC_STRING). The 'vlen' and 'compound' user-defined data types are supported. Vlen types are variable-length, or 'ragged' arrays, while compound types are similar to C structs (and numpy record arrays). Compound type support is not complete, since only compound types containing primitive data types (and not user-defined data types) can be read or written with this module. In other words, you can't yet use this module to save nested record arrays (record arrays with fields that are record arrays), although you can save any record array containing fields with any of the 'standard' fixed-size data types ('f4', 'f8', 'i1', 'i2', 'i4', 'i8', 'u1', 'u2', 'u4', 'u8' and 'S1').

Download

Requires

Install

Tutorial

1) Creating/Opening/Closing a netCDF file

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open an existing netCDF file. If the file is open for write access (w, r+ or a), you may write any type of data including new dimensions, groups, variables and attributes. netCDF files come in several flavors (NETCDF3_CLASSIC, NETCDF3_64BIT, NETCDF4_CLASSIC, and NETCDF4). The first two flavors are supported by version 3 of the netCDF library. NETCDF4_CLASSIC files use the version 4 disk format (HDF5), but do not use any features not found in the version 3 API. They can be read by netCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be read by HDF5 clients. NETCDF4 files use the version 4 disk format (HDF5) and use the new features of the version 4 API. The netCDF4 module can read files with any of these formats, but only writes NETCDF4 formatted files. To write NETCDF4_CLASSIC, NETCDF3_CLASSIC or NETCDF3_64BIT formatted files, use the netCDF4_classic module. To see what how a given file is formatted, you can examine the file_format Dataset attribute. Closing the netCDF file is accomplished via the close method of the Dataset instance.

Here's an example:
>>> import netCDF4
>>> rootgrp = netCDF4.Dataset('test.nc', 'w')
>>> print rootgrp.file_format
NETCDF4
>>>
>>> rootgrp.close()

2) Groups in a netCDF file

netCDF version 4 added support for organizing data in hierarchical groups, which are analagous to directories in a filesystem. Groups serve as containers for variables, dimensions and attributes, as well as other groups. A netCDF4.Dataset defines creates a special group, called the 'root group', which is similar to the root directory in a unix filesystem. To create Group instances, use the createGroup method of a Dataset or Group instance. createGroup takes a single argument, a python string containing the name of the new group. The new Group instances contained within the root group can be accessed by name using the groups dictionary attribute of the Dataset instance.
>>> rootgrp = netCDF4.Dataset('test.nc', 'a')
>>> fcstgrp = rootgrp.createGroup('forecasts')
>>> analgrp = rootgrp.createGroup('analyses')
>>> print rootgrp.groups
{'analyses': <netCDF4._Group object at 0x24a54c30>, 
 'forecasts': <netCDF4._Group object at 0x24a54bd0>}
>>>

Groups can exist within groups in a Dataset, just as directories exist within directories in a unix filesystem. Each Group instance has a 'groups' attribute dictionary containing all of the group instances contained within that group. Each Group instance also has a 'path' attribute that contains a simulated unix directory path to that group.

Here's an example that shows how to navigate all the groups in a Dataset. The function walktree is a Python generator that is used to walk the directory tree.
>>> fcstgrp1 = fcstgrp.createGroup('model1')
>>> fcstgrp2 = fcstgrp.createGroup('model2')
>>> def walktree(top):
>>>     values = top.groups.values()
>>>     yield values
>>>     for value in top.groups.values():
>>>         for children in walktree(value):
>>>             yield children
>>> print rootgrp.path, rootgrp
>>> for children in walktree(rootgrp):
>>>      for child in children:
>>>          print child.path, child
/ <netCDF4.Dataset object at 0x24a54c00>
/analyses <netCDF4.Group object at 0x24a54c30>
/forecasts <netCDF4.Group object at 0x24a54bd0>
/forecasts/model2 <netCDF4.Group object at 0x24a54cc0>
/forecasts/model1 <netCDF4.Group object at 0x24a54c60>
>>>

3) Dimensions in a netCDF file

netCDF defines the sizes of all variables in terms of dimensions, so before any variables can be created the dimensions they use must be created first. A special case, not often used in practice, is that of a scalar variable, which has no dimensions. A dimension is created using the createDimension method of a Dataset or Group instance. A Python string is used to set the name of the dimension, and an integer value is used to set the size. To create an unlimited dimension (a dimension that can be appended to), the size value is set to None. In this example, there both the time and level dimensions are unlimited.
>>> rootgrp.createDimension('level', None)
>>> rootgrp.createDimension('time', None)
>>> rootgrp.createDimension('lat', 73)
>>> rootgrp.createDimension('lon', 144)
All of the Dimension instances are stored in a python dictionary.
>>> print rootgrp.dimensions
{'lat': <netCDF4.Dimension object at 0x24a5f7b0>, 
 'time': <netCDF4.Dimension object at 0x24a5f788>, 
 'lon': <netCDF4.Dimension object at 0x24a5f7d8>, 
 'level': <netCDF4.Dimension object at 0x24a5f760>}
>>>
Calling the python len function with a Dimension instance returns the current size of that dimension. The isunlimited() method of a Dimension instance can be used to determine if the dimensions is unlimited, or appendable.
>>> for dimname, dimobj in rootgrp.dimensions.iteritems():
>>>    print dimname, len(dimobj), dimobj.isunlimited()
lat 73 False
time 0 True
lon 144 False
level 0 True
>>>
Dimension names can be changed using the renameDimension method of a Dataset or Group instance.

4) Variables in a netCDF file

netCDF variables behave much like python multidimensional array objects supplied by the numpy module. However, unlike numpy arrays, netCDF4 variables can be appended to along one or more 'unlimited' dimensions. To create a netCDF variable, use the createVariable method of a Dataset or Group instance. The createVariable method has two mandatory arguments, the variable name (a Python string), and the variable datatype. The variable's dimensions are given by a tuple containing the dimension names (defined previously with createDimension). To create a scalar variable, simply leave out the dimensions keyword. The variable primitive datatypes correspond to the dtype.str attribute of a numpy array, and can be one of 'f4' (32-bit floating point), 'f8' (64-bit floating point), 'i4' (32-bit signed integer), 'i2' (16-bit signed integer), 'i8' (64-bit singed integer), 'i1' (8-bit signed integer), 'u1' (8-bit unsigned integer), 'u2' (16-bit unsigned integer), 'u4' (32-bit unsigned integer), 'u8' (64-bit unsigned integer), or 'S1' (single-character string). There is also a 'S' datatype for variable length strings, which have no corresponding numpy data type (they are stored in numpy object arrays). Variables of datatype 'S' can be used to store arbitrary python objects, since each element will be pickled into a string (if it is not already a string) before being saved in the netCDF file (see section 10 for more on storing arrays of python objects). Pickle strings will be automatically un-pickled back into python objects when they are read back in. There is also support for netCDF user-defined datatypes, such as compound data types and variable length arrays. To create a Variable with a user-defined datatype, set the datatype argument to an instance of the class UserType. See section 9 for more on user-defined data types. The dimensions themselves are usually also defined as variables, called coordinate variables. The createVariable method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.
>>> times = rootgrp.createVariable('time','f8',('time',))
>>> levels = rootgrp.createVariable('level','i4',('level',))
>>> latitudes = rootgrp.createVariable('latitude','f4',('lat',))
>>> longitudes = rootgrp.createVariable('longitude','f4',('lon',))
>>> # two dimensions unlimited.
>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))
All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:
>>> print rootgrp.variables
{'temp': <netCDF4.Variable object at 0x24a61068>,
 'level': <netCDF4.Variable object at 0.35f0f80>, 
 'longitude': <netCDF4.Variable object at 0x24a61030>,
 'pressure': <netCDF4.Variable object at 0x24a610a0>, 
 'time': <netCDF4.Variable object at 02x45f0.4.58>, 
 'latitude': <netCDF4.Variable object at 0.3f0fb8>}
>>>
Variable names can be changed using the renameVariable method of a Dataset instance.

5) Attributes in a netCDF file

There are two types of attributes in a netCDF file, global and variable. Global attributes provide information about a group, or the entire dataset, as a whole. Variable attributes provide information about one of the variables in a group. Global attributes are set by assigning values to Dataset or Group instance variables. Variable attributes are set by assigning values to Variable instances variables. Attributes can be strings, numbers or sequences. Returning to our example,
>>> import time
>>> rootgrp.description = 'bogus example script'
>>> rootgrp.history = 'Created ' + time.ctime(time.time())
>>> rootgrp.source = 'netCDF4 python module tutorial'
>>> latitudes.units = 'degrees north'
>>> longitudes.units = 'degrees east'
>>> pressure.units = 'hPa'
>>> temp.units = 'K'
>>> times.units = 'days since January 1, 0001'
>>> times.calendar = 'proleptic_gregorian'
The ncattrs() method of a Dataset, Group or Variable instance can be used to retrieve the names of all the netCDF attributes. This method is provided as a convenience, since using the built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.
>>> for name in rootgrp.ncattrs():
>>>     print 'Global attr', name, '=', getattr(rootgrp,name)
Global attr description = bogus example script
Global attr history = Created Mon Nov  7 10.30:56 2005
Global attr source = netCDF4 python module tutorial
The __dict__ attribute of a Dataset, Group or Variable instance provides all the netCDF attribute name/value pairs in a python dictionary:
>>> print rootgrp.__dict__
{'source': 'netCDF4 python module tutorial',
'description': 'bogus example script',
'history': 'Created Mon Nov  7 10.30:56 2005'}
Attributes can be deleted from a netCDF Dataset, Group or Variable using the python del statement (i.e. del grp.foo removes the attribute foo the the group grp).

6) Writing data to and retrieving data from a netCDF variable

Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.
>>> import numpy as NP
>>> latitudes[:] = NP.arange(-90,91,2.5)
>>> print 'latitudes =\n',latitudes[:]
latitudes =
[-90.  -87.5 -85.  -82.5 -80.  -77.5 -75.  -72.5 -70.  -67.5 -65.  -62.5
 -60.  -57.5 -55.  -52.5 -50.  -47.5 -45.  -42.5 -40.  -37.5 -35.  -32.5
 -30.  -27.5 -25.  -22.5 -20.  -17.5 -15.  -12.5 -10.   -7.5  -5.   -2.5
   0.    2.5   5.    7.5  10.   12.5  15.   17.5  20.   22.5  25.   27.5
  30.   32.5  35.   37.5  40.   42.5  45.   47.5  50.   52.5  55.   57.5
  60.   62.5  65.   67.5  70.   72.5  75.   77.5  80.   82.5  85.   87.5
  90. ]
>>>
Unlike numpy array objects, netCDF Variable objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.
>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(rootgrp.dimensions['lat'])
>>> nlons = len(rootgrp.dimensions['lon'])
>>> print 'temp shape before adding data = ',temp.shape
temp shape before adding data =  (0, 0, 73, 144)
>>>
>>> from numpy.random.mtrand import uniform
>>> temp[0:5,0:10,:,:] = uniform(size=(5,10,nlats,nlons))
>>> print 'temp shape after adding data = ',temp.shape
temp shape after adding data =  (5, 10, 73, 144)
>>>
>>> # levels have grown, but no values yet assigned.
>>> print 'levels shape after adding pressure data = ',levels.shape
levels shape after adding pressure data =  (10,)
>>>

Note that the size of the levels variable grows when data is appended along the level dimension of the variable temp, even though no data has yet been assigned to levels.

Time coordinate values pose a special challenge to netCDF users. Most metadata standards (such as CF and COARDS) specify that time should be measure relative to a fixed date using a certain calendar, with units specified like hours since YY:MM:DD hh-mm-ss. These units can be awkward to deal with, without a utility to convert the values to and from calendar dates. A module called netcdftime.netcdftime is provided with this package to do just that. Here's an example of how it can be used:
>>> # fill in times.
>>> from datetime import datetime, timedelta
>>> from netcdftime import utime
>>> cdftime = utime(times.units,calendar=times.calendar,format='%B %d, %Y') 
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = cdftime.date2num(dates)
>>> print 'time values (in units %s): ' % times.units+'\n',times[:]
time values (in units hours since January 1, 0001): 
[ 17533056.  17533068.  17533080.  17533092.  17533104.]
>>>
>>> dates = cdftime.num2date(times[:])
>>> print 'dates corresponding to time values:\n',dates
dates corresponding to time values:
[2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00
 2001-03-02 12:00:00 2001-03-03 00:00:00]
>>>
Values of time in the specified units and calendar are converted to and from python datetime instances using the num2date and date2num methods of the utime class. See the netcdftime.netcdftime documentation for more details.

7) Efficient compression of netCDF variables

Data stored in netCDF Variable objects is compressed on disk by default. The parameters for the compression are determined by the zlib and complevel and shuffle keyword arguments to the createVariable method. The default values are zlib=True, complevel=6 and shuffle=True. To turn off compression, set zlib=False. complevel regulates the speed and efficiency of the compression (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). shuffle=False will turn off the HDF5 shuffle filter, which de-interlaces a block of data by reordering the bytes. The shuffle filter can significantly improve compression ratios. Setting fletcher32 keyword argument to createVariable to True (it's False by default) enables the Fletcher32 checksum algorithm for error detection.

If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of 0.1 degrees), you can dramatically improve compression by quantizing (or truncating) the data using the least_significant_digit keyword argument to createVariable. The least significant digit is the power of ten of the smallest decimal place in the data that is a reliable value. For example if the data has a precision of 0.1, then setting least_significant_digit=1 will cause data the data to be quantized using {NP.around(scale*data)/scale}, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Effectively, this makes the compression 'lossy' instead of 'lossless', that is some precision in the data is sacrificed for the sake of disk space.

In our example, try replacing the line
>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))
with
>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',),
least_significant_digit=3)
and see how much smaller the resulting file is.

8) Converting netCDF 3 files to netCDF 4 files (with compression)

A command line utility (nc3tonc4) is provided which can convert a netCDF 3 file (in NETCDF3_CLASSIC or NETCDF3_64BIT format) to a NETCDF4_CLASSIC file, optionally unpacking variables packed as short integers (with scale_factor and add_offset) to floats, and adding zlib compression (with the HDF5 shuffle filter and fletcher32 checksum). Data may also be quantized (truncated) to a specified precision to improve compression.
>>> os.system('nc3tonc4 -h')
nc3tonc4 [-h] [-o] [--zlib=(0|1)] [--complevel=(1-9)] [--shuffle=(0|1)]
         [--fletcher32=(0|1)] [--unpackshort=(0|1)]
         [--quantize=var1=n1,var2=n2,..] netcdf3filename netcdf4filename
-h -- Print usage message.
-o -- Overwite destination file
      (default is to raise an error if output file already exists).
--zlib=(0|1) -- Activate (or disable) zlib compression (default is activate).
--complevel=(1-9) -- Set zlib compression level (6 is default).
--shuffle=(0|1) -- Activate (or disable) the shuffle filter
                   (active by default).
--fletcher32=(0|1) -- Activate (or disable) the fletcher32 checksum
                      (not active by default).
--unpackshort=(0|1) -- Unpack short integer variables to float variables
                       using scale_factor and add_offset netCDF 
                       variable attributes (active by default).
--quantize=(comma separated list of "variable name=integer" pairs) --
  Truncate the data in the specified variables to a given decimal precision.
  For example, 'speed=2, height=-2, temp=0' will cause the variable
  'speed' to be truncated to a precision of 0.01, 
  'height' to a precision of 100 and 'temp' to 1.
  This can significantly improve compression. The default
  is not to quantize any of the variables.
If --zlib=1, the resulting NETCDF4_CLASSIC file will take up less disk space than the original netCDF 3 file (especially if the --quantize option is used), and will be readable by netCDF 3 clients as long as they have been linked against the netCDF 4 library.

9) Beyond homogenous arrays of a fixed type - User-defined datatypes

User-defined data types make it easier to store data in a netCDF 4 that does not fit well into regular arrays of data with a homogenous type. NetCDF 4 supports compound types, variable length types, opaque types and enum types. Currently, only the variable length (or 'vlen') type and the 'compound' type are supported.

A user-defined data type is created using the createUserType method of a Dataset or Group instance. This method returns an instance of the UserType class, and takes 3 arguments; the base data type, the type of user-defined data type ('vlen' or 'compound'), and a identification string. The base data type for a 'vlen' must be one of the fixed-size primitive data types ('S' is not allowed). The base data type for a 'compound' is a list of 3 element tuples. Each 3-tuple describes the type of one member of the compound type, and contains a name, a fixed-size primitive data type, and a shape. The UserType instance may then be passed to createVariable (instead of a string describing one of the primitive data types) to create a Variable with that user-defined data type. For example,
>>> vleni4 = rootgrp.createUserType('i4', 'vlen', 'vlen_i4')
>>> ragged = rootgrp.createVariable('ragged', vleni4, ('lat','lon'))

creates a Variable which is a variable-length, or 'ragged' array of 4-byte integers, with dimensions lat and lon.

To fill the variable length array with data, create a numpy object array of integer arrays and assign it to the variable with a slice.
>>> import random
>>> data = NP.empty(nlats*nlons,'O')
>>> for n in range(nlats*nlons):
>>>     data[n] = NP.arange(random.randint(1,10))+1
>>> data = NP.reshape(data,(nlats,nlons))
>>> ragged[:] = data
>>> print 'ragged array variable =\n',ragged[0:3,0:3]
ragged array variable =
[[[1] [1 2 3 4 5 6 7] [1 2]]
 [[1 2 3 4] [1 2 3 4 5 6 7 8] [1]]
 [[1 2 3 4 5 6 7] [1 2 3] [1 2 3 4 5 6 7]]]
Compound types are similar to C structs. They can be used to represent table-like structures composed of different primitive data types (the netCDF4 library supports nested compound types, but this module only supports fixed-size primitive data types within compound types). For example, compound types might be useful for representing multiple parameter values at each point on a grid, or at each time and space location for scattered (point) data. You can then access all the information for a point by reading one variable, instead of reading different parameters from different variables. Variables of compound type correspond directly to numpy record arrays. Here's a simple example using a compound type to represent meteorological observations at stations:
>>> # create an unlimited  dimension call 'station'
>>> rootgrp.createDimension('station',False)
>>> # define a compound data type (a list of 3-tuples containing
>>> # the name of each member, it's primitive data type, and it's size).
>>> # Only fixed-size primitive data types allowed (no 'S').
>>> # Members can be multi-dimensional arrays (in which case the third
>>> # element is a shape tuple instead of a scalar).
>>> datatype = [('latitude', 'f4',1), ('longitude', 'f4',1),
>>>             ('sfc_press','i4',1),
>>>             ('temp_sounding','f4',10),('press_sounding','i4',10),
>>>             ('location_name','S1',80)]
>>> # use this data type definition to create a user-defined data type
>>> # called 'station_data'
>>> table = rootgrp.createUserType(datatype,'compound','station_data')
>>> # create a variable of of type 'station_data'
>>> statdat = rootgrp.createVariable('station_obs', table, ('station',))
>>> # create record array, assign data to it.
>>> ra = NP.empty(1,statdat.dtype_base)
>>> ra['latitude'] = 40.
>>> ra['longitude'] = -105.
>>> ra['sfc_press'] = 818
>>> ra['temp_sounding'] = (280.3,272.,270.,269.,266.,258.,254.1,250.,245.5,240.)
>>> ra['press_sounding'] = range(800,300,-50)
>>> # only fixed-size primitive data types can currenlty be used
>>> # as compound data type members (although the library supports
>>> # nested compound types).
>>> # To store strings in a compound data type, each string must be 
>>> # stored as fixed-size (in this case 80) array of characters.
>>> def stringtoarr(string,NUMCHARS):
>>>     # function to convert a string to a array of NUMCHARS characters
>>>     arr = NP.zeros(NUMCHARS,'S1')
>>>     arr[0:len(string)] = tuple(string)
>>>     return arr
>>> ra['location_name'] = stringtoarr('Boulder, Colorado, USA',80)
>>> # assign record array to variable slice.
>>> statdat[0] = ra
>>> # or just assign a tuple of values to variable slice
>>> # (will automatically be converted to a record array).
>>> statdat[1] = (40.78,-73.99,1002,
>>>             (290.2,282.5,279.,277.9,276.,266.,264.1,260.,255.5,243.),
>>>             range(900,400,-50),stringtoarr('New York, New York, USA',80))
This module doesn't support attributes of compound type. To assign an attribute like units to each member of the compound type I do the following: When this attribute is read in it can be converted back to a python dictionary using the eval function. It can be converted into hash-like objects in other languages as well (including C), since this string is also valid JSON (JavaScript Object Notation). JSON is a lightweight, language-independent data serialization format.
>>> units_dict = {'latitude': 'degrees north', 'longitude': 'degrees east',
'sfc_press': 'Pascals', 'temp_sounding': 'Kelvin',
                  'press_sounding': 'Pascals','location_name': None}
>>> statdat.units = repr(units_dict)
>>> # convert units string back to a python dictionary.
>>> statdat_units = eval(statdat.units)
>>> # print out data in variable (including units attribute)
>>> print 'data in a variable of compound type:\n----'
>>> for data in statdat[:]:
>>>    for item in statdat.dtype_base:
>>>        name = item[0]
>>>        type = item[1]
>>>        if type == 'S1': # if array of chars, convert value to string.
>>>            print name,': value =',data[name].tostring(),'units =',statdat_units[name]
>>>        else:
>>>            print name,': value =',data[name],'units =',statdat_units[name]
>>>    print '----'
----
data in a variable of compound type:
latitude : value = 40.0 units = degrees north
longitude : value = -105.0 units = degrees east
sfc_press : value = 818 units = Pascals
temp_sounding : value = [ 280.29998779  272.          270.          269.          266.
  258.   254.1000061   250.          245.5         240.        ] units = Kelvin
press_sounding : value = [800 750 700 650 600 550 500 450 400 350] units = Pascals
location_name : value = Boulder, Colorado, USA units = None
----
latitude : value = 40.7799987793 units = degrees north
longitude : value = -73.9899978638 units = degrees east
sfc_press : value = 1002 units = Pascals
temp_sounding : value = [ 290.20001221  282.5         279.          277.8999939   276.  
  266.   264.1000061   260.          255.5         243.        ] units = Kelvin
press_sounding : value = [900 850 800 750 700 650 600 550 500 450] units = Pascals
location_name : value = New York, New York, USA units = None
----

10) Storing arrays of arbitrary python objects using the 'S' datatype

Variables with datatype 'S' can be used to store variable-length strings, or python objects. Here's an example.
>>> strvar = rootgrp.createVariable('strvar','S',('level'))
Typically, a string variable is used to hold variable-length strings. They are represented in python as numpy object arrays containing python strings. Below an object array is filled with random python strings with random lengths between 2 and 12 characters.
>>> chars = '1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> data = NP.empty(10,'O')
>>> for n in range(10):
>>>     stringlen = random.randint(2,12)
>>>     data[n] = ''.join([random.choice(chars) for i in range(stringlen)])
Now, we replace the first element of the object array with a python dictionary.
>>> data[0] = {'spam':1,'eggs':2,'ham':False}
When the data is assigned to the string variable, elements which are not python strings are converted to strings using the python cPickle module.
>>> strvar[:] = data
When the data is read back in from the netCDF file, strings which are determined to be pickled python objects are unpickled back into objects.
>>> print 'string variable with embedded python objects:\n',strvar[:]
string variable with embedded python objects:
[{'eggs': 2, 'ham': False, 'spam': 1} QnXTY8B nbt4zisk pMHIn1F wl3suHW0OquZ
 wn5kxEzgE nk AGBL pe kay81]
Attributes can also be python objects, although the rules for whetherr they are saved as pickled strings are different. Attributes are converted to numpy arrays before being saved to the netCDF file. If the attribute is cast to an object array by numpy, it is pickled and saved as a text attribute (and then automatically unpickled when the attribute is accessed). So, an attribute which is a list of integers will be saved as an array of integers, while an attribute that is a python dictionary will be saved as a pickled string, then unpickled automatically when it is retrieved. For example,
>>> from datetime import datetime
>>> strvar.timestamp = datetime.now()
>>> print strvar.timestamp
2006-02-11 13:26:27.238042

Note that data saved as pickled strings will not be very useful if the data is to be read by a non-python client (the data will appear to the client as an ugly looking binary string). A more portable (and human-readable) way of saving simple data structures like dictionaries and lists is to serialize them into strings using a human-readable cross-language interchange format such as JSON or YAML. An example of this is given in the discussion of compound data types in section 9.

All of the code in this tutorial is available in examples/tutorial.py, along with several other examples. Unit tests are in the test directory.


Contact: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>

Copyright: 2006 by Jeffrey Whitaker.

License: Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Classes [hide private]
  Dataset
A netCDF Dataset is a collection of dimensions, groups, variables and attributes.
  Dimension
A netCDF Dimension is used to describe the coordinates of a Variable.
  Group
Groups define a hierarchical namespace within a netCDF file.
  UserType
A UserType instance is used to describe some of the new data types supported in netCDF 4.
  Variable
A netCDF Variable is used to read and write netCDF data.

Functions [hide private]
  _get_att(...)
Private function to get an attribute value given its name
  _get_att_names(...)
Private function to get all the attribute names in a group
  _get_dims(...)
Private function to create Dimension instances for all the dimensions in a Group or Dataset
  _get_format(...)
Private function to get the netCDF file format
  _get_grps(...)
Private function to create Group instances for all the groups in a Group or Dataset
  _get_vars(...)
Private function to create Variable instances for all the variables in a Group or Dataset
  _set_att(...)
Private function to set an attribute name/value pair
  _set_default_format(...)
Private function to set the netCDF file format

Variables [hide private]
  __version__ = '0.6.1'
  _key = 'S'
  _nctonptype = {1: 'i1', 2: 'S1', 3: 'i2', 4: 'i4', 5: 'f4', 6: 'f8...
  _nptonctype = {'i8': 10, 'f4': 5, 'u8': 11, 'i1': 1, 'u4': 9, 'S1'...
  _npversion = '1.0'
  _private_atts = ['_grpid', '_grp', '_varid', 'groups', 'dimensions',...
  _supportedtypes = ['i8', 'f4', 'u8', 'i1', 'u4', 'S1', 'i2', 'u1', 'i4...
  _value = 12

Function Details [hide private]

_get_att(...)

 
Private function to get an attribute value given its name

_get_att_names(...)

 
Private function to get all the attribute names in a group

_get_dims(...)

 
Private function to create Dimension instances for all the dimensions in a Group or Dataset

_get_format(...)

 
Private function to get the netCDF file format

_get_grps(...)

 
Private function to create Group instances for all the groups in a Group or Dataset

_get_vars(...)

 
Private function to create Variable instances for all the variables in a Group or Dataset

_set_att(...)

 
Private function to set an attribute name/value pair

_set_default_format(...)

 
Private function to set the netCDF file format

Variables Details [hide private]

__version__

None
Value:
'0.6.1'                                                                
      

_key

None
Value:
'S'                                                                    
      

_nctonptype

None
Value:
{1: 'i1',
 2: 'S1',
 3: 'i2',
 4: 'i4',
 5: 'f4',
 6: 'f8',
 7: 'u1',
 8: 'u2',
...                                                                    
      

_nptonctype

None
Value:
{'S': 12,
 'S1': 2,
 'f4': 5,
 'f8': 6,
 'i1': 1,
 'i2': 3,
 'i4': 4,
 'i8': 10,
...                                                                    
      

_npversion

None
Value:
'1.0'                                                                  
      

_private_atts

None
Value:
['_grpid',
 '_grp',
 '_varid',
 'groups',
 'dimensions',
 'variables',
 'dtype',
 'file_format',
...                                                                    
      

_supportedtypes

None
Value:
['i8', 'f4', 'u8', 'i1', 'u4', 'S1', 'i2', 'u1', 'i4']                 
      

_value

None
Value:
12