Project Swiss Water Ressources
Introduction
This project organizes data about Swiss water resources for researchers and data anlaysts. Multiple collaborators will analyse datasets that are centrally hosted and organised by your project team. Geographical information about rivers, stations as well as regional surface information is tightly coupled to measurements and publicly available sensor data. The relevant open data will be processed, stored and managed in a suitable manner in a database management system such as PostgreSQL/PostGIS. Additionally you will create and store spatial and non-spatial information yourself.
Description
Description, requirement analysis, instructions and tasks are found in this project document on the server.
Reporting
How long should the report be (approximate page number)?
- Number of pages is not meaningful by itself - but rather completeness and comprehensibility of the document is.
Should we read relevant literature about the topic and mention them in the project description?
- In principle, no additional literature for the project report is required. Even though a good investment for the course and the future is the basic database book of Elmasri & Navathe (any edition is worth the money). For the spatial part we recommend PostGIS in Action 2nd edition.
Data analysis for ER and DB design
We recommend to you having a close look at datasets later to be imported to the database. Find more information in the previously mentioned project document. We used the following tools and methods to to this and to create an ER-model. We used this template file (Powerpoint) to draw our spatial ER model.
1) Starting the analysis (based on source data files alone)
- Download the data needed (csv, xlsx, shape, gpkg, ...);
- Visualise spatial data in a new QGIS project
- Open .csv files in Notepad++ or similar text editor
- Check information (metadata) on website(s)
- Try to find out the potential datatypes used in these datasets
- Check values and consistency for each field/column in the records/rows. Convenient functionality:
- QGIS -> Choose Layer -> Properties -> Fields
- QGIS -> Choose Layer -> Open Attribute Table: Sort each attribute to see what kind of data we really have.
- Texteditor NotePad++: header text, data rows, sorting, missing or NULL or N/A values (which all means the same).
- Do you have troubles with special characters such as ä, ü, not displaying correctly as part of a shape file column? Solution: Add a another text file with file extension .cpg to your other 'shape files' (such as .shp, .dbf, .prj, ...). This .cpg file must have the same name as the other files. Open the .cpg file, type UTF-8, save and close.
2) During dataset analysis (still file based)
- Does the data definitely fit to the datatype expected or declared?
- Does the metadata give the correct information?
- What is the precise meaning of each column?
- Do I have missing, NULL, N/A values in geometry columns (and other columns)? What to do with these?
- Do we have to normalize? Are there hidden entity types (-> functional dependencies...) in the data?
- How can this data be fitted into an Entity Relationship model (ER)?
- What would - in a following step - the database datatypes (of each column/attribute) be?
3) Based on this analysis and project description and requirement
- Build up your ER diagram
- Derivate the relations (primary key fields; foreign keys)
- Design the database tables (with primary key, possibly foreign keys, datatypes). Hint for datatypes to be used for tables in PostgreSQL with Postgis
Importing spatial data to the database
1) Reproject your file based spatial datasets if necessary
- Your project AND your datasets should use and be in EPSG:2056 (CH1903+/LV95)
- Reproject your file based datasets (=create a reprojected file copy with the 'reproject layer' tool) to avoid later database import problems; reason: QGIS projects can do visual on-the-fly reprojections of your file based datasets; but some database importers/formats can not correctly deal with it;
- By the way: Spatial datasets in the database with an undesired projection can be reprojected with SQL at all times! See FAQ.
2) Import file based spatial datasets to temporary spatial database tables
- Import the reprojected spatial datasets without any further changes to spatial tables in the database (use your projectxx schema). You should use a prefix in the table names to mark their temporary status, e.g. tmp_river. Recommendations concerning QGIS 3.28:
- Three possibilities are available in QGIS: 1) From QGIS top level menu: Database -> DB Manager Tool: Import Layer/File; 2) Processing Toolbox: Export to PostgreSQL; 3) Some datasets or formats work with Processing Toolbox: gdal PostgreSQL database exporter;
- Some of our datasets won't correctly import to the postgis database
- Timestamp/date data types in gpkgs do not work; use the fgdb [filegeodatabase] format instead or convert gpkg or csv to shape file; after this, date fields will import as strings;
- Heterogeneous (Multiline and Lines together in same file based spatial dataset) do not completely import; Multipart to Singlepart tool (Processing Toolbox) lets you convert these gpkgs to homogeneous singlepart features; import to PostGIS should work afterwards;
3) Transfer and convert the data from the temporary tables to the destination spatial tables
- Create empty destination tables in the database reflecting your DB design.
- Transfer and convert data by using SQL statements from your temporary tables to destination tables.
- Datatypes which do not fit can be CASTed in SQL on the fly! Example: A string field containing only numbers can be casted to an integer
- Write SQL queries to SELECT and CAST and transform data from your temporary tables. Use INSERT INTO to transfer the data to your destination tables. See example(s) here and in the lecture.
- Flatten a 3D geometry field to 2D directly in the source table and constrain to SRID 2056
4) Generate your own spatial data with attributes in your destination spatial tables
- Enter further data - as needed - to all your database tables;
- Use QGIS' editing features (try to find out how they work) to create or digitze new features (spatial records) directly with your spatial tables in the database.
- Use DBeaver with direct SQL (spatial/non-spatial)or graphical editing of non-spatial tables.
Importing flat (non-spatial) data to the database
1) Convert your source data (e.g from Excel) to .csv files
- Create a .csv (comma separated values) text-file from the file format that you have. Excel: Save As --> choose the CSV --> Save.
- Check your text-file with Notepad++ to check the structure and see how data looks like.
2) Use DBeaver's Import Data to import .csv files
- Under your userxx/projectxx connection rightclick Tables folder in the DBeaver GUI --> Import Data
- Source Type and Format --> Next
- Input File(s) --> Click on the source to add your .csv file --> Next
- Adjust the Column delimiter
- Tables mapping --> Click the New... button to create a new database table --> e.g. tmp_contmeasurements
- Tables mapping --> Click the Columns...button to create the columns --> Give simple names (no special characters) to target columns or skip (drop down) unneeded ones. Use varchar datatype with sufficient length (e.g. 256) for strings. Use varchar strings for date/datetimes, too. For numbers use datatype numeric. Don't worry about these generic datatypes in your temporary table! You can later CAST these string/number types (the data respecively) from this table to the desired datatypes in your destination table. --> Next
- Settings --> Next
- Confirm --> Start
- All this will create a new table tmp_contmeasurements with varchar/numeric fields filled with your data from the .csv file.
3) Generate data in your (non-spatial) destination tables
- Enter further data - as needed - to all your database tables.
- Use DBeaver with direct SQL or graphical editing with the GUI.