Project Swiss Water Ressources

Introduction

This project organizes data about Swiss water resources for researchers and data anlaysts. Multiple collaborators will analyse datasets that are centrally hosted and organised by your project team. Geographical information about rivers, stations as well as regional surface information is tightly coupled to measurements and publicly available sensor data. The relevant open data will be processed, stored and managed in a suitable manner in a database management system such as PostgreSQL/PostGIS. Additionally you will create and store spatial and non-spatial information yourself.

Description

Description, requirement analysis, instructions and tasks are found in this project document on the server.

Reporting

How long should the report be (approximate page number)?

Number of pages is not meaningful by itself - but rather completeness and comprehensibility of the document is.

Should we read relevant literature about the topic and mention them in the project description?

In principle, no additional literature for the project report is required. Even though a good investment for the course and the future is the basic database book of Elmasri & Navathe (any edition is worth the money). For the spatial part we recommend PostGIS in Action 2nd edition.

Data analysis for ER and DB design

We recommend to you having a close look at datasets later to be imported to the database. Find more information in the previously mentioned project document. We used the following tools and methods to to this and to create an ER-model. We used this template file (Powerpoint) to draw our spatial ER model.

1) Starting the analysis (based on source data files alone)

Download the data needed (csv, xlsx, shape, gpkg, ...);
Visualise spatial data in a new QGIS project
Open .csv files in Notepad++ or similar text editor
Check information (metadata) on website(s)
Try to find out the potential datatypes used in these datasets
Check values and consistency for each field/column in the records/rows. Convenient functionality:
- QGIS -> Choose Layer -> Properties -> Fields
- QGIS -> Choose Layer -> Open Attribute Table: Sort each attribute to see what kind of data we really have.
- Texteditor NotePad++: header text, data rows, sorting, missing or NULL or N/A values (which all means the same).
Do you have troubles with special characters such as ä, ü, not displaying correctly as part of a shape file column? Solution: Add a another text file with file extension .cpg to your other 'shape files' (such as .shp, .dbf, .prj, ...). This .cpg file must have the same name as the other files. Open the .cpg file, type UTF-8, save and close.

2) During dataset analysis (still file based)

Does the data definitely fit to the datatype expected or declared?
Does the metadata give the correct information?
What is the precise meaning of each column?
Do I have missing, NULL, N/A values in geometry columns (and other columns)? What to do with these?
Do we have to normalize? Are there hidden entity types (-> functional dependencies...) in the data?
How can this data be fitted into an Entity Relationship model (ER)?
What would - in a following step - the database datatypes (of each column/attribute) be?

3) Based on this analysis and project description and requirement

Build up your ER diagram
Derivate the relations (primary key fields; foreign keys)
Design the database tables (with primary key, possibly foreign keys, datatypes). Hint for datatypes to be used for tables in PostgreSQL with Postgis ------------------------------------------------------------------------------- -- Recommended PostgreSQL Types ------------------------------------------------------------------------------- varchar(n) --> string with predefined length numeric(t,p) --> precise calculations, measurements, money; t=precision/total length; p=precision/positions after decimal point; smallint (=int2) --> enumerations, key attributes, ID attibutes, integer numbers from -32768 to +32767 integer (=int4) --> enumerations, key attributes, ID attibutes, integer numbers from -2147483648 to +2147483647 bigint(=int8) --> enumerations, key attributes, ID attibutes, integer numbers from -9223372036854775808 to 9223372036854775807 timestamp --> date with time, often used date --> date only geometry --> for vector geometries, general type (postgis), please use a subtype geometry(POLYGON, 2056) --> subtyped, line geometries ch-lv95 (postgis) geometry(LINESTRING, 3857) --> subtyped, line geometries wgs84 | pseudo web mercator (postgis) geometry(POINT, 4326) --> subtyped, point geometries wgs84 (postgis) geometry(MULTIPOLYGON, 2056) --> subtyped, multipolygon geometries ch-lv95 (postgis) geometry(MULTILINESTRING, 2056) --> subtyped, multiline geometries ch-lv95 (postgis) geometry(MULTIPOINT, 2056) --> subtyped, multipoint geometries ch-lv95 (postgis)

Importing spatial data to the database

1) Reproject your file based spatial datasets if necessary

Your project AND your datasets should use and be in EPSG:2056 (CH1903+/LV95)
Reproject your file based datasets (=create a reprojected file copy with the 'reproject layer' tool) to avoid later database import problems; reason: QGIS projects can do visual on-the-fly reprojections of your file based datasets; but some database importers/formats can not correctly deal with it;
By the way: Spatial datasets in the database with an undesired projection can be reprojected with SQL at all times! See FAQ.

2) Import file based spatial datasets to temporary spatial database tables

Import the reprojected spatial datasets without any further changes to spatial tables in the database (use your projectxx schema). You should use a prefix in the table names to mark their temporary status, e.g. tmp_river. Recommendations concerning QGIS 3.28:
- Three possibilities are available in QGIS: 1) From QGIS top level menu: Database -> DB Manager Tool: Import Layer/File; 2) Processing Toolbox: Export to PostgreSQL; 3) Some datasets or formats work with Processing Toolbox: gdal PostgreSQL database exporter;
- Some of our datasets won't correctly import to the postgis database
  - Timestamp/date data types in gpkgs do not work; use the fgdb [filegeodatabase] format instead or convert gpkg or csv to shape file; after this, date fields will import as strings;
  - Heterogeneous (Multiline and Lines together in same file based spatial dataset) do not completely import; Multipart to Singlepart tool (Processing Toolbox) lets you convert these gpkgs to homogeneous singlepart features; import to PostGIS should work afterwards;

3) Transfer and convert the data from the temporary tables to the destination spatial tables

Create empty destination tables in the database reflecting your DB design.
Transfer and convert data by using SQL statements from your temporary tables to destination tables.
Datatypes which do not fit can be CASTed in SQL on the fly! Example: A string field containing only numbers can be casted to an integer -- Convert/Cast a string containing a number to an integer; example without a table; SELECT CAST('100' AS INTEGER) AS tmp_col;
Write SQL queries to SELECT and CAST and transform data from your temporary tables. Use INSERT INTO to transfer the data to your destination tables. See example(s) here and in the lecture. -- a) flatten a 3D geometry with z-values to 2D -- needed e.g. if your source dataset has MULTILINEZ but your target table needs MULTILINE -- b) manually reclassify/cast strings to correct numbers - here dischtype -- a separate lookup table defines the values -- c) cast a number from a string (containing numbers) -- d) cast date / timestamp from a string INSERT INTO river (geom, discharge, slope, dischtype, numcol, datecol, timestampcol) SELECT ST_Force2D(q.geom), q."Discharge", q."Slope", CASE WHEN q."ABFLUSS" = 'klein' THEN 1 WHEN q."ABFLUSS" = 'mittel' THEN 2 WHEN q."ABFLUSS" = 'gross' THEN 3 ELSE NULL END, CAST(q.stringcol_withnums AS INT2), to_date(q.datestr,'dd.MM.yyyy'), to_timestamp(q.fromtimestampstr, 'dd.MM.yyyy HH24:MI:SS') FROM tmp_river q;
Flatten a 3D geometry field to 2D directly in the source table and constrain to SRID 2056 ALTER TABLE tmp_river ALTER COLUMN geom TYPE geometry(MULTIPOLYGON, 2056) USING ST_Force2D(geom);

4) Generate your own spatial data with attributes in your destination spatial tables

Enter further data - as needed - to all your database tables;
Use QGIS' editing features (try to find out how they work) to create or digitze new features (spatial records) directly with your spatial tables in the database.
Use DBeaver with direct SQL (spatial/non-spatial)or graphical editing of non-spatial tables.

Importing flat (non-spatial) data to the database

1) Convert your source data (e.g from Excel) to .csv files

Create a .csv (comma separated values) text-file from the file format that you have. Excel: Save As --> choose the CSV --> Save.
Check your text-file with Notepad++ to check the structure and see how data looks like.

2) Use DBeaver's Import Data to import .csv files

Under your userxx/projectxx connection rightclick Tables folder in the DBeaver GUI --> Import Data
Source Type and Format --> Next
Input File(s) --> Click on the source to add your .csv file --> Next
Adjust the Column delimiter
Tables mapping --> Click the New... button to create a new database table --> e.g. tmp_contmeasurements
Tables mapping --> Click the Columns...button to create the columns --> Give simple names (no special characters) to target columns or skip (drop down) unneeded ones. Use varchar datatype with sufficient length (e.g. 256) for strings. Use varchar strings for date/datetimes, too. For numbers use datatype numeric. Don't worry about these generic datatypes in your temporary table! You can later CAST these string/number types (the data respecively) from this table to the desired datatypes in your destination table. --> Next
Settings --> Next
Confirm --> Start
All this will create a new table tmp_contmeasurements with varchar/numeric fields filled with your data from the .csv file.

3) Generate data in your (non-spatial) destination tables

Enter further data - as needed - to all your database tables.
Use DBeaver with direct SQL or graphical editing with the GUI.

Contact