Just another WordPress.com site


I’m Moving!

This wild and crazy blog is pulling up stakes and bumping over to a new location:
Same insanely valuable content, new URL.  So if you are one of my two to three subscribers, here is the subscription to the new location.  See you over there!

Get Ready to Live

If you have the time, the inclination, and the wherewithal, then gosh I hope you will come check out the Emergency Management and Public Safety forum at the IMAGIN annual conference this May in Lansing, Mi.  I’ll be presenting on Thursday the 4th with some salacious details of public safety applications we’ve built here at IDV Solutions.

Rapturous Abstract
Spanning Scale: Driving Understanding and Action in Emergency Management Visualizations

Emergency management systems are increasingly moving into extensible and secure online environments where access, specialization, and collaboration are more readily achieved.  As these systems, particularly the geographic component within them, gain confidence and market penetration, their contribution to the overall workflow of emergency managers is quickly evolving beyond the passive display of mapped data.
This presentation stems from solution development with several emergency management clients ranging from regional to federal to international levels.  These organizations, while varying in size and mission, at their core wish to more effectively process and present data, enable their people to have a greater sense of understanding and empowerment, and equip them to undertake a more concrete actionable response.  What are the similarities between clients of these scales?  What are the differences?
Specific examples will be discussed including scale of focus, contextualization, cartographic options and implications, technology, tandem visualization tools, and design considerations.

Over the years, I’ve been struck by the similarity of the goals and requirements of Emergency Management / Public Safety applications, regardless of the variety of the scope and functions of those systems.  Whether an emergency manager for a county or for a global entity, we’re all people, and in a crisis (or preferably before a crisis) folks need to get what’s going on, and take decisive and confident action.  How could I put it any better than these guys…

“I’m a person. Bret’s a person. You’re a person. That person over there is a person.
And each person deserves to be treated like a person.”



John Nelson / IDV Solutions / john.nelson@idvsolutions.com

Geometry and Geography in SQL Server 2008 Spatial Queries

SQL Server 2008 has great spatial capabilities, notably spatial querying.  But performing a Geographic spatial query is a different question and will return different results than a Geometric spatial query.  Neither method is wrong, but it is important to know the differences, and as such what you are asking, in order to understand the results.  Both options are valid, depending on what you are looking for.


Geography   Geometry
Geography takes into account our relatively spherical earth, and so uses spherical geometry.  If you draw a straight line in geography, it will use the great circle line, which is just an awesome way of saying the shortest distance between two points –like pulling a string tight between two points on a globe.


Why Geography?
You probably want to use this if you need very precise measurements over relatively long distances.  Since it is more of a “real world” option, Geography will return results that are truer to life (but will look squirrelly on flattened maps like Google Maps or Bing).

Why Not Geography?
There is a lot of overhead in setting up spatial data as Geography.  Since it has to accommodate the roundness of earth, you have to provide more parameters than you would with Geometry.
Also, when you see the results of straight lines drawn using Geography, they will appear as arched in a typical map application –sometimes the confusion can be more trouble than the “accuracy” is worth.  Your call.
Also, if you are drawing a rectangular query in VFX (which assumes Geometry), Geography will return a different set of features that the query rectangle implies.

  Geometry imagines that the world is actually a flat rectangle (much like how we see it every day in our Mercator maps) and uses good old planar geometry.  Since most online map applications live in a Mercator world, planar geometry is a fine fit.  A straight East-West line will just follow the line of latitude, and as a result look straight on an equirectangular map.

Why Geometry?
Geometry is a more straightforward mode to manage.  You can get your application up and running faster and have, generally, good enough spatial results like distance.  Also, geometry will look better on an equirectangular projection (again, Google Maps and Bing) because your lines will be “straight” as far as the map is concerned.
Often times Lat/Long bounding box spatial queries are what folks are after, if that is the case, Geometry is the answer.
Oh, and another good reason to use Geometry is because that is how VFX is set up to draw queries.

Why Not Geometry
If you need very high precision or perhaps have geometry that passes the international date line, or your query areas span very large areas, then you should consider Geography.  After all…the Earth isn’t actually flat.

Spatial Queries

A rectangular spatial query using the Geographic (spherical geometry) method is illustrated here.  My delicately manicured fingers are holding a rectangular shape over a globe.  It simply pulls a tight line around my four corner points, and returns anything within.  But look how hosed it appears in a planar system (Mercator)…

A Geographic rectangle query in its native spherical environment, and stretched over a planar system.

A bounding box query is illustrated here in a Mercator map.  Looks fine –notice how it respects lines of latitude and longitude.  Now check out how that bounding box looks in on a real globe…

A Geometric bounding box query in its native planar environment, and warped onto a spherical system.

However, sometimes these results are confusing.  For example, Winnipeg is north of Vancouver and Anticosti Island (see awesome image above), but it clearly does not fall within my Geographic rectangle (because on a sphere, North is relative).

A Geographic rectangle query, with an indication of the Geometric version of the same bottom boundary.

If you are performing a Geographic query, but are anticipating the results of a Geometric query, you might, for example, be confused as to why items falling within the highlighted area above are omitted.  That red dashed line follows what would be the bottom edge of a Geometric query.  It’s all in the expectations of the question you are asking.

By the way, you’ll notice a bigger discrepancy between Geometry and Geography query results the farther you get from the equator.

What the Deuce IS a Great Circle?
Here is a really high-tech illustration of a great circle.  I’ve pulled a string tight over the globe between Vancouver and Anticosti Island.  This is the shortest distance between these two places.  If you were to fly an airplane from Vancouver directly to Anticosti Island, you’d follow this string (the straight line between the two locations).  So check out how this perfectly straight line appears in Mercator!  This is the exact same image, warped onto a Mercator projection.  You’ll also notice that in the Mercator image, the globes lines of latitude now appear straight.

An actual straight line (as illustrated by that string on the globe up there), and that same straight line in planar geometry (in this case, Mercator).


More Reading


John Nelson / IDV Solutions / john.nelson@idvsolutions.com

The Crazy World of Range Breaks



Divide and Color
What do each of these thematic maps have in common?  They are all mapping the exact same data.  What makes them so different?  They are using different classification methods.

One of the first things that pops up when whipping up a thematic data visualization that has discrete range brakes (commonly a choropleth map, but not necessarily) is how do I bucketize my numbers?  Picking range breaks to drive your color categorization can range anywhere from arbitrary and predefined to rigorously statistical and dynamic, and the various options will generate very different looking results.


Some Options
There are lots of ways to carve datasets into discrete classes.  I’ll go over three of them…

  • Quantile
    Breaks the data into equally filled groups
  • Standard Deviation
    Breaks the data into statistical chunks diverging from the mean
  • Equal Interval
    Breaks the data into equally distant groups

Or you could just eyeball the data and then divide it into range breaks that look good or are easy to read.  This, actually, is probably the most common method that I’ve seen in online mapping applications.  It is also the most fertile ground for misunderstanding.  More details on these methods below, including examples and how-to’s.


When choosing a method of classifying your dataset into discrete ranges, there a couple of things to consider off the bat.  First, what does the data distribution look like (if it is dynamic, what does it generally look like?)?  Is it skewed toward one extreme or the other?  Is it relatively normal (bell shaped on a histogram)?  Are there outliers to consider?
Applying various classification methods can create very different impressions of the data.  Any interface is a manifestation of tradeoffs, let’s take a look at some examples…

  • Normal Data
    With relatively normally-distributed data, picking a classification method may not make a massive difference in your visualization.
    Your biggest concern is probably how many breaks to make, and what colors to use.  Check out this example of average age per county -nice and normal.


Normal data tends to deliver a relatively consistent message across many classification methods.


  • Skewed Data
    Now it gets fun.  With a dataset like the percent of folks who consider themselves multi-ethnic, the distribution is far from normal.  In this case, there is a bulge at the lower end and a long tail that eventually pinches off around 30% multi-ethnic.  What a difference the classification methods make here!
    Am I telling the truth with the map (above) on the left?  Yes.  I can clearly see the locations of higher and lower proportions of multi-ethnic US residents, even regional trends and abrupt shifts.
    Am I telling the truth with the map on the right?  Yes.  I get a clear indication that most places in the US are, proportionally, pretty low in multi-ethnic residents.
    I’m telling the truth about two different things.


Skewed data may look way different depending on the classification method.


Examples in Detail



Equal Interval for Normal data…

Equal interval slices the data into equally distant range breaks.  Some color buckets get more counties than others, but if the distribution is wide, then the visualization will be adequate.

The gist: Evenly spaced, unevenly filled buckets.



Quantile for Normal data…

Quantile yields a pretty high-contrast map, that is reliably good looking.  The fact that the data is normally distributed doesn’t really matter –each bucket has the exact same number of counties, but you’ll notice that in order to accommodate that, the ranges have to span varying distances.

The gist: Unevenly spaced, evenly filled buckets.



Standard Deviation for Normal data…

Trusty old Standard Deviation.  It is going to look alright in most cases, but it really shines when applied to normal datasets.  You’ve got the mean in the middle and you chunk it out from there based on standard deviation distances in either direction from the mean.  Beautiful.

Also, don’t do what I did –you should put actual values in your legend instead of the math nerd standard deviation breaks.

And while we’re at it, it’s often a good idea to pick a diverging color scheme for data that is classified by Standard Deviation.  Pick a neutral color for the mean (center) range and then transition to one color on the left and another color on the right.  ColorBrewer gives some nice background here along with a rocking tool to generate your own cartographic color schemes.

The gist: Evenly spaced (to a statistician), unevenly filled buckets.



Equal Interval for Skewed data…

Equal Interval falls apart pretty easily.  If the data is remotely skewed then it’s feast or famine for the evenly spaced color buckets.  In this case most of the buckets are largely empty while the low end bucket (0% – 6% multi-ethnic) is jam packed.

Equal Interval is more fair to the population as a whole but does not capture smaller scale fluctuations.

To be fair, just because all the eggs are in one basket and the map is largely monochromatic doesn’t mean that it’s useless.  You could argue that is a a perfectly fair treatment of the data because it illustrates the predominant characteristic of the data: it’s highly skewed to the lower end.

The gist: Evenly spaced, unevenly filled buckets.



Quantile for Skewed data…

Quantile to the rescue.  When buckets are defined by an equal number of member counties,

Now Devil’s Advocate.  It could be argued that this method implies a false or misleading heterogeneity if the data.  While the vast number of counties have a proportionally tiny multi-ethnic population, this method could imply a greater variance (as compared to the Equal Interval example above). It’s just not fair.

Devil’s Advocate, Advocate.  How could you get any more fair than groups of equal size?  Plus the result illustrates a finer articulation of the variance.

Just remember, when reading a map, read those legends and take the range breaks for what they are worth.  Quantile is a good illustration of that.

The gist: Unevenly spaced, evenly filled buckets.



Standard Deviation for Skewed data…

Standard Deviation.  It is still providing valuable visual breaks when applied to highly skewed data.  But I can never get too cozy with it because it is so darn hard to explain.

The gist: Evenly spaced (to a statistician), unevenly filled buckets.



Equal Interval

  • Determine overall population range (highest value – lowest value) for the value of interest…
  • Determine range break distance (population range / desired # of breaks)…
  • Insert break every Nth value.


  • Determine equally-filled range quantity (overall population / desired # of groups)…
  • Segment population by every Nth item.

Standard Deviation


It’s one thing to willfully mislead others by the categorization and representation of data (obviously not cool).  It’s another to do it on accident and mislead your audience and yourself.  Varying classification methods will produce very different results.  In gaining a little background about various methods of classification you’ll be in a better position to…

  • Create better, more effective visualizations
  • More keenly understand the visualizations of others
  • Read the legend
  • Use what you learn for good instead of evil

In any case, the thing to keep in mind is effective and truthful communication; your visualization should enable the data to tell it’s story.

Let us know if we can be of any help.



John Nelson / IDV Solutions / john.nelson@idvsolutions.com


Data.gov: 100s of New Fed Datasets Released, Many Environmental

A quick note from Scott Caulk, Dir. of Product Management here at IDV

For those of you who troll for new web feeds, data.gov has added some new ones.  Where else can you find a World Copper Smelters SHP file?
By the way, there does not appear to be a direct correlation between copper smelting and seismic activity:

And then there was Mercator


Today ESRI announced via an emailed newsletter that “ArcGIS Online Maps Migrated to Google Maps/Bing Maps Tiling Scheme.”  This is good news for anyone who has ever had to wrestle with the GIS-y world of the Platte Carre tile schemes that don’t line up with popular Mercator tile scheme (used by Google, Bing, Yahoo, and pretty much everyone else).

It looks like the service will be available for six more months (maybe more depending on pockets of outrage), with no updated content.


“Goodbye my worthy equirectangular foe.  We hardly knew yee.”


It looks like standing of our rectangular amigo has taken a welcome hit.  While both projections have their pros and cons, they just didn’t play well together and online application providers will have more options going forward as a result of the Plate Carre scheme getting tossed.

Spatial Symbology

My family has been watching an excellent three-part broadcast on PBS called the Human Spark, where Alan Alda speaks with researchers and other scientists about the ‘nature of human uniqueness.’  In one segment Cognitive psychologist Elizabeth Spelke illustrates the ability of 2-year-olds to manipulate symbols and language to build spatial models of the real world, and how this is a specific example of a uniquely human ability.  So chew on that all you non-homo erectuses!

Check this out…


In the experiment above, the green dot stays a green dot until legending language turns it into a spatial model for the real world.  Neat stuff.

At IDV, a big part of what we do is to work with clients who have complicated business architectures that need to be visualized, often through thematic map symbology.  There are so many visual dimensions of point representations to consider, but equally important is their appropriate naming and meaningful organization.  Providing the language framework to bridge symbol to reality bridges the spatial model and drives insightful design.




John Nelson / IDV Solutions / john.nelson@idvsolutions.com