[GRASS5] Problems with vector import, and a suggestion
David D Gray
ddgray at armadce.demon.co.uk
Thu, 07 Feb 2002 15:22:10 +0000
Aleksey Naumov wrote:
> Hi GRASS developers,
> I just killed 3 days to import a rather simple (86 polygons, 6 holes)
> file into GRASS. In the process I lost some hair :-), but hopefully also
> gained some insight which I would like to share with anyone who is
> read this...
> I am using GRASS 5.0, the HEAD branch from CVS, compiled on Feb 4 2002.
> 1. I first tried v.in.shape (and v.in.shape.pg). v.in.shape failed
for both a
> line and a polygon shape files. It generated a lot of messages like:
> WARNING: line 5466 label: 217 matched another label: 455.
> Failed to attach an attribute (category 471) to a line.
> but, more importantly, the geometry was completely scrambled for both
> and polygon shapes.
v.in.shape has been till now really an experimental module whose
development has been driven by on-going attempts to deal with the many
errors in linework that is so common in most of the desktop formats that
are based on whole polygon coverages, eg. ESRI shapefile and MapInfo
MIF. That isn't to say that the errors caused in using these modules is
always or solely due to bad files. They have been hacked and hacked over
as we have found, and had to deal with, new layers of problems in the
structure of shape files, etc. (Question: should we bother with trying
bad linework?). So by now the existing modules are really in an
unmaintainable state. The good news is that replacements in the stable
branch for the v.*.shape and v.*.mif modules are nearly ready, and
should be available for the next snapshot in the pre-release sequence
for GRASS5.0. These will probably at first just duplicate the errors in
the current modules, but it should be much easier to debug and maintain
> 2. Next I tried v.in.arc (and v.in.arc.pg). I had to do some work in
> to convert polygon shape file to correct coverage, then UNGENERATE
> polygon labels. Here I got correct geometry, but polygon ids (dig_att
> were screwed up:
Using Arc/Info is the critical step here. Ungen' files are in a
structured topological format, like GRASS files, so import should be a
simple matter. Don't know why it doesn't handle atts properly. maybe we
need to look at this as well.
> 3. Finally, "m.in.e00" did the job for me. It created correct
> built correct dig_att file. Together with "pg.in.dbf" for the associated
> attribute file (ARC's .PAT file saved in TABLES with INFODBASE, then
> and re-exported in ArcView) I now got a complete vector file with
data. I am
> finally able to map the attributes as described in the GRASS/Postgres
Again a direct transfer from the A/I format which is similar to GRASS.
> Resume and suggestions
> It's been quite a hair-pulling experience. Of course, it's possible
> did some things wrong, but I tried these and related commands
> etc) in many different ways and quite a few times. I am trying to
> to do to make this sort of vector import easier...
> Here are some suggestions. I apologize in advance if I missed
> am completely off on something -- let me know.
> 1. My experience has been that quality of modules varies greatly.
> fine, some are buggy and some just do not work. What makes the situation
> worse for the user is that modules seem to overlap and duplicate each
> and it's not clear which one to use (for example in my case, besides
> modules mentioned above there are also v.import and v.in.arc.poly --
> confusing to say the least!)
These stem from early attempts at integration or to deal with problems
arising from the polygon coverages which were a new format at the time.
> In the long run modules with similar functionality will have to be
> some discarded. In the meantime, it seems a useful clean-up strategy
> (a) Establish sort of a standard (e.g. for GRASS 5.1) -- a set of
> requirements that modules have to comply with (coding standards,
> up-to-date and detailed help page, etc.)
> (b) Select a few most useful modules and pull them up to this standard
> (c) Identify those modules that conform to the standard in the help
> They will be seen as reliable, get more testing, while others may be
> merged/upgraded gradually.
It is timely you should raise these points. There is a plan, beginning
to take shape now for GRASS 5.1, to move much of the processing
functionality into library routines, and just have the 'modules' as high
level interfaces that integrate these functions to perform specific
tasks. It has been suggested also to have the modules written in a
scripting or 'macro' language like Python, for easier development.
Standards relating to such things as options are also being developed.
> 2. Some sort of functional listing of modules in the help pages would be
> nice, maybe even based on classification used for TclTk interface.
> 3. No specific suggestion here, just complaining :-) about handling
> attributes. Associating vectors with attributes through point markers
> dig_att files just seems too difficult and unnatural to me. May be I am
> missing something here... I don't know how it's done in 5.1.
It is, admittedly, a weak procedure. It just used to be standard, and
was traditionally used by Arc/Info, which was the standard GRASS had to
look to for compatibilty with its main competitors. This by the way is
why you get the :
WARNING: line 5466 label: 217 matched another label: 455
warnings. The labels get muddled up with the wrong lines/points/areas if
something goes wrong. And a bad line can contaminate in this way many
good ones, so you can get a silent error and not notice.
GRASS 5.1 does away with this, and codes the categories (or at least
indices) into the main binary file that contains the vector lines. The
actual data is stored in a RDBMS. GRASS has the in-built dbmi interface
that will be the default.
We should retain an ability to remain compatible with older versions and
still have the ability to apply area points and their attributes, as the
idea is that an `area point' is a representative point of the interior
region of a polygon. Say v.in/out.atts.
The lamest choice of all however would be to go down the road of
co-sequencing as MIF/MID and shapefile do. This is unstable as a means
for transferring data (ie an interchange format), and disastrous when
used as the main method of data storage in your application.