Click or drag to resize

GTFS files integration

[This is preliminary documentation and is subject to change.]

ABOUT THIS GUIDE

The purpose of this tutorial is to help you understand how to build an ETL application with Eety. In this case, csv files are loaded into a Sql server database. You will also learn how to set bulk mode to speed up the overall process.

GETTING STARTED

DOWNLOAD CSV FILEs FROM SNCF DEVELOPER WEB SITE

CREATE THE SQL SERVER DATABASE

This topic describes how to create a database in SQL Server by using SQL Server Management Studio

To create the sql server database:

  1. In Object Explorer, connect to an instance of the SQL Server Database Engine and then expand that instance.

  2. Right-click Databases, and then click New Database.

  3. In New Database, enter Gtfs name.

  4. Load and run the Gtfs.sql script.

CREATE A NEW SOLUTION

Create an application solution:

  1. Launch Eety Desktop.

  2. Click NEW in the file toolbar.

  3. Optionally, change the destination folder in the file dialog box.

  4. Fill in a name for the solution file (without any .eety extension) in the file name text box.

  5. Click Open.

    The solution workspace opens its Solution Explorer on the right side and a start page in a New Solution window.

    This start page simply depicts steps to walk through building an application:

    • Add source and target connections

    • Map sources and targets with the data transfer model

    • Add an analysis workflow

openfiledialog
STEP 1: ADD SOURCE AND TARGET CONNECTIONS

DELIMITED TEXT FILES

  • Sncf lets you download a zip of gtfs CSV (comma separated values) files:

    1. agency.txt

    2. calendar.txt

    3. calendar_dates.txt

    4. routes.txt

    5. stop_times.txt

    6. stops.txt

    7. trips.txt

  • Eety Eety provides a dsv (delimiter separated values) connector to extract these quotes.

ADD DELIMITED TEXT FILES CONNECTION

  • To create agency.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs agency

      File

      gtfs-lines-last\agency.txt

      Schema path

      Table Name

      Agency

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Format

      Is Key

      agency_id

      String

      True

      agency_name

      String

      agency_url

      String

      agency_timezone

      String

      agency_lang

      String

    Example 3 CSV Agency 1
    Example 3 CSV Agency 2
  • To create calendar.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs calendar

      File

      gtfs-lines-last\calendar.txt

      Schema path

      Table Name

      Calendar

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Format

      Is Key

      service_id

      Int32

      True

      monday

      Boolean

      tuesday

      Boolean

      wednesday

      Boolean

      thursday

      Boolean

      friday

      Boolean

      saturday

      Boolean

      sunday

      Boolean

      start_date

      DateTime

      yyyyMMdd

      end_date

      DateTime

      yyyyMMdd

    Example 3 CSV Calendar 1
    Example 3 CSV Calendar 2
  • To create calendar_dates.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs calendar dates

      File

      gtfs-lines-last\calendar_dates.txt

      Schema path

      Table Name

      CalendarDates

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Format

      Is Key

      service_id

      Int32

      True

      date

      DateTime

      yyyyMMdd

      True

      exception_type

      Byte

    Example 3 CSV Calendar Dates 1
    Example 3 CSV Calendar Dates 2
  • To create routes.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs routes

      File

      gtfs-lines-last\routes.txt

      Schema path

      Table Name

      Routes

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Nullable

      Is Key

      route_id

      String

      True

      agency_id

      String

      True

      route_short_name

      String

      route_long_name

      String

      route_desc

      String

      route_type

      Int16

      route_url

      String

      True

      route_color

      String

      True

      route_text_color

      String

      True

    Example 3 CSV Routes 1
    Example 3 CSV Routes 2
  • To create stop_times.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs stop times

      File

      gtfs-lines-last\stop_times.txt

      Schema path

      Table Name

      StopTimes

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Nullable

      Is Key

      trip_id

      String

      arrival_time

      String

      departure_time

      String

      stop_id

      String

      stop_sequence

      String

      stop_headsign

      String

      True

      pickup_type

      String

      True

      drop_off_type

      String

      True

    Example 3 CSV Stop Times 1
    Example 3 CSV Stop Times 2
  • To create stops.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs stops

      File

      gtfs-lines-last\stops.txt

      Schema path

      Table Name

      Stops

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Format

      Is Key

      stop_id

      String

      True

      stop_name

      String

      stop_desc

      String

      stop_lat

      Double

      stop_lon

      Double

      zone_id

      String

      zone_id

      String

      stop_url

      String

      location_type

      Byte

      parent_station

      String

    Example 3 CSV Stops 1
    Example 3 CSV Stops 2
  • To create transfers.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs transfers

      File

      gtfs-lines-last\transfers.txt

      Schema path

      Table Name

      Transfers

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Nullable

      Is Key

      from_stop_id

      String

      to_stop_id

      String

      transfer_type

      Byte

      min_transfer_time

      Int32

    Example 3 CSV Transfer 1
    Example 3 CSV Transfer 2
  • To create trips.txt file connection:

    1. Right click on Connections node, in Solution Explorer pane.

    2. Expand the Add connection menu.

    3. Select Delimited Text File.

    4. Fill the dialog box with the relevant values:

      File Tab

      Parameters

      Values

      Name

      gtfs trips

      File

      gtfs-lines-last\trips.txt

      Schema path

      Table Name

      Trips

      Culture

      English (united states)

      Encoding

      Schema on first row

      Yes

      Text Qualifier

      False

      Column delimiter

      ,

      Row delimiter

      CRLF

      Schema

      Comments

      Columns Tab

      Column

      Type

      Format

      Is Key

      route_id

      String

      True

      service_id

      String

      trip_id

      String

      trip_headsign

      String

      True

      direction_id

      String

      block_id

      String

    Example 3 CSV Trips 1
    Example 3 CSV Trips 2
STEP 2: MAP SOURCES AND TARGETS WITH THE LOGICAL MODEL

Basic concepts

One single model shares its elements with all the components in the application. This is a logical business model with tables, columns and relationships mapped to the data sources corresponding elements. At runtime, data extracted from the data sources is stored into these elements, starting an integration or analysis pipeline.

In this example, we will:

  1. Map the CSV files as extraction source.

  2. Map the SQL Server datasource as loading destination.

MAP THE CSV Files

To create the extraction mapping:

  1. Right click on the Mappings node, in the Solution Explorer pane.

  2. Add a new Extraction type mapping.

To create and map a new model:

  1. Optionally, double click the New mapping node in the Solution Explorer pane.

  2. Optionally, click show-help 24x 24to open the help popup.

  3. Click on the create and map a new transfer model to the selected data source label.

Example 3 Mapping 1
Note Note

Right click the New mapping node in Solution Explorer and rename it Csv.

Right click the model node in top of Solution Explorer and rename it Gtfs.

To create and map a new Calendar table:

  1. Optionally double click Csv mapping in Solution Explorer.

  2. Click Gtfs node in Transfer model.

  3. Expand gtfs calendar in Data Sources.

  4. Expand gtfs calendar in Data Sources.

  5. Click the Calendar table below gtfs calendar.

  6. Click add+map-classifier 24x 24 in Extract mapping toolbar to create and map a new class.

Example 3 Mapping 2
Note Note

Right click on the New mapping node in Solution Explorer (or select it and press F2) and rename it Log Files.

To create and map other tables:

Repeat 1 to 5 steps with gtfs calendar dates, gtfs routes, gtfs stop times, gtfs stops, gtfs trips.

MAP THE SQL SERVER DATABASE

To create the loading mapping:

  1. Right click on the Mappings node, in the Solution Explorer pane.

  2. Add a new >Loading type mapping.

To map the transfer model:

  1. Optionally, optionally close help popup.

  2. Click Gtfs node in Transfer model.

  3. Expand local sql server in Data Sources.

  4. Click below the Gtfs database

  5. Click map-model 24x 24 in the mapping pane toolbar to map the data transfer model to the selected data source.

Example 3 Mapping 3
Note Note

Right click the New mapping node in Solution Explorer and rename it Sql Server.

STEP 3: ADD AN ETL (EXTRACT,TRANSFORM AND LOAD) WORKFLOW

To add an analysis workflow:

  1. Right click on Workflows, in Solution Explorer.

  2. Add an analysis worfklow.

  3. Right click the new workflow node and rename it Analysis.

Example 2 Workflow
STEP 4: SET UP THE LOADING PHASE

Basic concepts

A trigger is a parameterized conditional procedure automatically executed on data flow events.

  • The Delete All Data trigger will automatically delete all the records in the Sql server database to prevent a primary key violation exception.

To add a delete data trigger:

  1. Expand the ETL node in the Solution Explorer pane and double click on the Loading node.

  2. Select the Gtfs source in the left pane tree of the Extraction Stage.

  3. Double click on trigger-datasourcetruncate button in the bottom toolbar of the center pane.

    Example 3 Loading Delete 1
STEP 5: TEST THE APPLICATION

To test the application

  1. Click debugger 16x 16 in Solution Explorer toolbar.

  2. Click workflow-start 16x 16 in Test New Application.

  3. A default browser instance should be started.