DataBricks and PySpark

by Mark Nielsen
Copyright August 2023



  1. Links
  2. Get DataBricks 14 day evaluation
  3. Saving passwords
  4. Expect and automation


Links



DataBricks 14 day trial

If you have problems deleting and recreating workspaces...

To get crendiatials for the next part
https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html

Create a token.
https://docs.databricks.com/en/dev-tools/auth.html

Get other crenditials -- the hostname and http address.
https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html



odbc and python

I had to do this on my AWS EC2, because the version of Ubuntu was older and I had stuff running on it. I didn't want to upgrade.

https://docs.databricks.com/en/dev-tools/pyodbc.html



Python driver for Ubuntu

I did this on a laptop. Installed latest Linut Mint which is based on the latest Ubuntu. Python worked for this.

https://docs.databricks.com/en/dev-tools/python-sql-connector.htm

First setup the env.



ODBC driver for Ubuntu

https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html



Installing Python and other software

https://docs.databricks.com/en/dev-tools/python-sql-connector.html

I am using Ubuntu EC2 server to connect.



Load data into your mysql server on EC2

You could use another source, like RDS MySQL or RDS Aurora, but in this case we are using an EC2 server runnning MySQL.