[GRASS5] GRASS raster "live links" proposal
Eric G. Miller
egm2 at jps.net
Wed, 20 Feb 2002 19:14:27 -0800
On Wed, Feb 20, 2002 at 06:15:52PM +0100, Markus Neteler wrote:
> Proposal for linking external raster data into GRASS mapsets through GDAL
> The metadata (color table, cellhd info, null and range info) would all
> remain in the existing file types within the GRASS mapset. However the
> actual raster data (normally found in the cell/<name> or fcell/<name>
> directories within the mapset would be missing in the case of linked
Might consider not storing any excess metadata, see comments below...
> As well, the cellhd file would have an additional "link_filename:" line
> containing the file containing the actual raster file being linked to, and
> a "link_band:" line indicating what band within the referred to file was
> linked to. These extra lines would only occure in "linked" cell headers,
> not regular local ones.
Maybe: URL: file://path/to/file.ext?band=[1-n] ? Conceptually, no
reason this couldn't be an http or other protocol url. GRASS would
need to cache data during the session for performance reasons, and
only requery if the protocol in use cached a snapshot of data with
a window different than the "current" one. If it just downloads
a whole file into the cache, then a requery would not be necessary.
Just a thought.
> API Changes
> Because virtually all of the metadata is stored in the regular auxilary
> files, it is anticipated that not very many functions would need to be
> altered. In particular the following (or underlying internal functions)
> would need to be altered:
Probably more than that would need modification.
> It might be necessary to modify this slightly to remove linked cells properly
> (though it shouldn't delete the file linked to).
Think g.remove should handle missing parts okay. But there are a few
other modules that may need modification, especially those that attempt
to edit a raster in place (r.compress, r.null, d.rast.edit, others?).
> Modify to list linked filename and band for linked datasets.
> Potentially a new program will be needed to update the link pointer if the
> external file moves. Should this program also support "refreshing" metadata
> with regard to raster updates to the external file? I am suggesting
> updating the null mask, range info and histogram info.
> Open Issues
> o Should we really put the link information in the cellhd file? I don't
> think it is desirable to extend the Cell_Head structure, so perhaps we
> should keep the link information somewhere else?
Don't see it'd be that big of a burden on Cell_hd.
> o The G_open_cell_old() code for linked rasters would need to do some
> consistency checking to ensure that the GDAL raster size still matches
> that in the cellhd file. Should we try to recognise when the raster file
> has changed, and that information about the linked file in GRASS, such as
> the histogram, range and nulls may be wrong? I think we should not but
> this may cause problems if the linked file is altered outside of GRASS.
Most likely will need to do this. Possibly storing the file's mtime is
enough, but maybe an md5sum would be better?
> o Should we try to use the GDAL file's native concept of nulls instead of
> that in GRASS? For now I think not, but perhaps eventually.
Since this is a GRASS 5.1 concept, I think might as well ditch the NULL
file concept. Don't know how GDAL handles NULL's. The only difficult
NULL situation is with small integer data sets. What's a NULL for
"char" sized data (-128)? Bigger integers can just use SHRT_MIN,
INT_MIN, or LONG_MIN.
> o At the code level GRASS already supports virtual datasets defined by
> r.mapcalc, right? Perhaps the code that supports this will provide a
> guide for where to hook in the GDAL support code.
That's all done by r.mapcalc AFAIK. Don't think there's anything in
libgis supporting this.
> o Are the code segments that maintain range information, histograms and so
> forth abstracted from the raster data access? If they are built right
> into the code that writes the cells there could be problems with linked
> files and extra work to be done. If they build up their meta information
> using the "regular" raster access API there should be no problems.
Most of this data is generated when a raster is written. Generally,
G_update_cell_stats() is called when integer cell files are being
written, iff FCB.want_histogram is true. Range info is always updated
per row (see put_row.c). Range is either G_row_update_range() or
G_row_update_fp_range(). The general read/write routines for these
files just read/write the "auxilliary file".
The biggest issue will be hooking in the external gdal read mechanism
into the funky black magic of the cell window buffer code hell that
resides in libgis, so as to make the reading of such data transparent
to other modules. Essentially, there will be at least two data
abstractions (and probably copies) before the "final" data gets
put in the user's row buffer. It might be possible to short
circuit some of the work (so only the gdal code deals with internal
buffers) by modifying "struct G__" in G.h. Need a hook in there
to be sure you can catch when a read via gdal should occur. Probably,
that's where you'd have to park a pointer to any GDAL internal data
structure that needs to be maintained between calls.
Eric G. Miller <firstname.lastname@example.org>