diff options
author | Nate Sesti <33237525+sestinj@users.noreply.github.com> | 2023-10-09 18:37:27 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-09 18:37:27 -0700 |
commit | f09150617ed2454f3074bcf93f53aae5ae637d40 (patch) | |
tree | 5cfe614a64d921dfe58b049f426d67a8b832c71f /server/continuedev/plugins/recipes/DDtoBQRecipe | |
parent | 985304a213f620cdff3f8f65f74ed7e3b79be29d (diff) | |
download | sncontinue-f09150617ed2454f3074bcf93f53aae5ae637d40.tar.gz sncontinue-f09150617ed2454f3074bcf93f53aae5ae637d40.tar.bz2 sncontinue-f09150617ed2454f3074bcf93f53aae5ae637d40.zip |
Preview (#541)
* Strong typing (#533)
* refactor: :recycle: get rid of continuedev.src.continuedev structure
* refactor: :recycle: switching back to server folder
* feat: :sparkles: make config.py imports shorter
* feat: :bookmark: publish as pre-release vscode extension
* refactor: :recycle: refactor and add more completion params to ui
* build: :building_construction: download from preview S3
* fix: :bug: fix paths
* fix: :green_heart: package:pre-release
* ci: :green_heart: more time for tests
* fix: :green_heart: fix build scripts
* fix: :bug: fix import in run.py
* fix: :bookmark: update version to try again
* ci: 💚 Update package.json version [skip ci]
* refactor: :fire: don't check for old extensions version
* fix: :bug: small bug fixes
* fix: :bug: fix config.py import paths
* ci: 💚 Update package.json version [skip ci]
* ci: :green_heart: platform-specific builds test #1
* feat: :green_heart: ship with binary
* fix: :green_heart: fix copy statement to include.exe for windows
* fix: :green_heart: cd extension before packaging
* chore: :loud_sound: count tokens generated
* fix: :green_heart: remove npm_config_arch
* fix: :green_heart: publish as pre-release!
* chore: :bookmark: update version
* perf: :green_heart: hardcode distro paths
* fix: :bug: fix yaml syntax error
* chore: :bookmark: update version
* fix: :green_heart: update permissions and version
* feat: :bug: kill old server if needed
* feat: :lipstick: update marketplace icon for pre-release
* ci: 💚 Update package.json version [skip ci]
* feat: :sparkles: auto-reload for config.py
* feat: :wrench: update default config.py imports
* feat: :sparkles: codelens in config.py
* feat: :sparkles: select model param count from UI
* ci: 💚 Update package.json version [skip ci]
* feat: :sparkles: more model options, ollama error handling
* perf: :zap: don't show server loading immediately
* fix: :bug: fixing small UI details
* ci: 💚 Update package.json version [skip ci]
* feat: :rocket: headers param on LLM class
* fix: :bug: fix headers for openai.;y
* feat: :sparkles: highlight code on cmd+shift+L
* ci: 💚 Update package.json version [skip ci]
* feat: :lipstick: sticky top bar in gui.tsx
* fix: :loud_sound: websocket logging and horizontal scrollbar
* ci: 💚 Update package.json version [skip ci]
* feat: :sparkles: allow AzureOpenAI Service through GGML
* ci: 💚 Update package.json version [skip ci]
* fix: :bug: fix automigration
* ci: 💚 Update package.json version [skip ci]
* ci: :green_heart: upload binaries in ci, download apple silicon
* chore: :fire: remove notes
* fix: :green_heart: use curl to download binary
* fix: :green_heart: set permissions on apple silicon binary
* fix: :green_heart: testing
* fix: :green_heart: cleanup file
* fix: :green_heart: fix preview.yaml
* fix: :green_heart: only upload once per binary
* fix: :green_heart: install rosetta
* ci: :green_heart: download binary after tests
* ci: 💚 Update package.json version [skip ci]
* ci: :green_heart: prepare ci for merge to main
---------
Co-authored-by: GitHub Action <action@github.com>
Diffstat (limited to 'server/continuedev/plugins/recipes/DDtoBQRecipe')
4 files changed, 238 insertions, 0 deletions
diff --git a/server/continuedev/plugins/recipes/DDtoBQRecipe/README.md b/server/continuedev/plugins/recipes/DDtoBQRecipe/README.md new file mode 100644 index 00000000..d50324f7 --- /dev/null +++ b/server/continuedev/plugins/recipes/DDtoBQRecipe/README.md @@ -0,0 +1,3 @@ +# DDtoBQRecipe + +Move from using DuckDB to Google BigQuery as the destination for your `dlt` pipeline diff --git a/server/continuedev/plugins/recipes/DDtoBQRecipe/dlt_duckdb_to_bigquery_docs.md b/server/continuedev/plugins/recipes/DDtoBQRecipe/dlt_duckdb_to_bigquery_docs.md new file mode 100644 index 00000000..eb68e117 --- /dev/null +++ b/server/continuedev/plugins/recipes/DDtoBQRecipe/dlt_duckdb_to_bigquery_docs.md @@ -0,0 +1,85 @@ +### Credentials Missing: ConfigFieldMissingException + +You'll see this exception if `dlt` cannot find your bigquery credentials. In the exception below all of them ('project_id', 'private_key', 'client_email') are missing. The exception gives you also the list of all lookups for configuration performed - [here we explain how to read such list](run-a-pipeline.md#missing-secret-or-configuration-values). + +``` +dlt.common.configuration.exceptions.ConfigFieldMissingException: Following fields are missing: ['project_id', 'private_key', 'client_email'] in configuration with spec GcpServiceAccountCredentials + for field "project_id" config providers and keys were tried in following order: + In Environment Variables key WEATHERAPI__DESTINATION__BIGQUERY__CREDENTIALS__PROJECT_ID was not found. + In Environment Variables key WEATHERAPI__DESTINATION__CREDENTIALS__PROJECT_ID was not found. +``` + +The most common cases for the exception: + +1. The secrets are not in `secrets.toml` at all +2. The are placed in wrong section. For example the fragment below will not work: + +```toml +[destination.bigquery] +project_id = "project_id" # please set me up! +``` + +3. You run the pipeline script from the **different** folder from which it is saved. For example `python weatherapi_demo/weatherapi.py` will run the script from `weatherapi_demo` folder but the current working directory is folder above. This prevents `dlt` from finding `weatherapi_demo/.dlt/secrets.toml` and filling-in credentials. + +### Placeholders still in secrets.toml + +Here BigQuery complain that the format of the `private_key` is incorrect. Practically this most often happens if you forgot to replace the placeholders in `secrets.toml` with real values + +``` +<class 'dlt.destinations.exceptions.DestinationConnectionError'> +Connection with BigQuerySqlClient to dataset name weatherapi_data failed. Please check if you configured the credentials at all and provided the right credentials values. You can be also denied access or your internet connection may be down. The actual reason given is: No key could be detected. +``` + +### Bigquery not enabled + +[You must enable Bigquery API.](https://console.cloud.google.com/apis/dashboard) + +``` +<class 'google.api_core.exceptions.Forbidden'> +403 POST https://bigquery.googleapis.com/bigquery/v2/projects/bq-walkthrough/jobs?prettyPrint=false: BigQuery API has not been used in project 364286133232 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=364286133232 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry. + +Location: EU +Job ID: a5f84253-3c10-428b-b2c8-1a09b22af9b2 + [{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Google developers console API activation', 'url': 'https://console.developers.google.com/apis/api/bigquery.googleapis.com/overview?project=364286133232'}]}, {'@type': 'type.googleapis.com/google.rpc.ErrorInfo', 'reason': 'SERVICE_DISABLED', 'domain': 'googleapis.com', 'metadata': {'service': 'bigquery.googleapis.com', 'consumer': 'projects/364286133232'}}] +``` + +### Lack of permissions to create jobs + +Add `BigQuery Job User` as described in the [destination page](../destinations/bigquery.md). + +``` +<class 'google.api_core.exceptions.Forbidden'> +403 POST https://bigquery.googleapis.com/bigquery/v2/projects/bq-walkthrough/jobs?prettyPrint=false: Access Denied: Project bq-walkthrough: User does not have bigquery.jobs.create permission in project bq-walkthrough. + +Location: EU +Job ID: c1476d2c-883c-43f7-a5fe-73db195e7bcd +``` + +### Lack of permissions to query/write data + +Add `BigQuery Data Editor` as described in the [destination page](../destinations/bigquery.md). + +``` +<class 'dlt.destinations.exceptions.DatabaseTransientException'> +403 Access Denied: Table bq-walkthrough:weatherapi_data._dlt_loads: User does not have permission to query table bq-walkthrough:weatherapi_data._dlt_loads, or perhaps it does not exist in location EU. + +Location: EU +Job ID: 299a92a3-7761-45dd-a433-79fdeb0c1a46 +``` + +### Lack of billing / BigQuery in sandbox mode + +`dlt` does not support BigQuery when project has no billing enabled. If you see a stack trace where following warning appears: + +``` +<class 'dlt.destinations.exceptions.DatabaseTransientException'> +403 Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. DML queries are not allowed in the free tier. Set up a billing account to remove this restriction. +``` + +or + +``` +2023-06-08 16:16:26,769|[WARNING ]|8096|dlt|load.py|complete_jobs:198|Job for weatherapi_resource_83b8ac9e98_4_jsonl retried in load 1686233775.932288 with message {"error_result":{"reason":"billingNotEnabled","message":"Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. Table expiration time must be less than 60 days while in sandbox mode."},"errors":[{"reason":"billingNotEnabled","message":"Billing has not been enabled for this project. Enable billing at https://console.cloud.google.com/billing. Table expiration time must be less than 60 days while in sandbox mode."}],"job_start":"2023-06-08T14:16:26.850000Z","job_end":"2023-06-08T14:16:26.850000Z","job_id":"weatherapi_resource_83b8ac9e98_4_jsonl"} +``` + +you must enable the billing. diff --git a/server/continuedev/plugins/recipes/DDtoBQRecipe/main.py b/server/continuedev/plugins/recipes/DDtoBQRecipe/main.py new file mode 100644 index 00000000..65149500 --- /dev/null +++ b/server/continuedev/plugins/recipes/DDtoBQRecipe/main.py @@ -0,0 +1,31 @@ +from textwrap import dedent + +from ....core.main import Step +from ....core.sdk import ContinueSDK +from ....core.steps import MessageStep +from .steps import LoadDataStep, SetUpChessPipelineStep, SwitchDestinationStep + +# Based on the following guide: +# https://github.com/dlt-hub/dlt/pull/392 + + +class DDtoBQRecipe(Step): + hide: bool = True + + async def run(self, sdk: ContinueSDK): + await sdk.run_step( + MessageStep( + name="Move from using DuckDB to Google BigQuery as the destination", + message=dedent( + """\ + This recipe will walk you through the process of moving from using DuckDB to Google BigQuery as the destination for your dlt pipeline. With the help of Continue, you will: + - Set up a dlt pipeline for the chess.com API + - Switch destination from DuckDB to Google BigQuery + - Add BigQuery credentials to your secrets.toml file + - Run the pipeline again to load data to BigQuery""" + ), + ) + >> SetUpChessPipelineStep() + >> SwitchDestinationStep() + >> LoadDataStep() + ) diff --git a/server/continuedev/plugins/recipes/DDtoBQRecipe/steps.py b/server/continuedev/plugins/recipes/DDtoBQRecipe/steps.py new file mode 100644 index 00000000..dfe25d9e --- /dev/null +++ b/server/continuedev/plugins/recipes/DDtoBQRecipe/steps.py @@ -0,0 +1,119 @@ +import os +from textwrap import dedent + +from ....core.main import Step +from ....core.sdk import ContinueSDK, Models +from ....core.steps import MessageStep +from ....libs.util.paths import find_data_file +from ....plugins.steps.find_and_replace import FindAndReplaceStep + +AI_ASSISTED_STRING = "(✨ AI-Assisted ✨)" + + +class SetUpChessPipelineStep(Step): + hide: bool = True + name: str = "Setup Chess.com API dlt Pipeline" + + async def describe(self, models: Models): + return "This step will create a new dlt pipeline that loads data from the chess.com API." + + async def run(self, sdk: ContinueSDK): + # running commands to get started when creating a new dlt pipeline + await sdk.run( + [ + "python3 -m venv .env", + "source .env/bin/activate", + "pip install dlt", + "dlt --non-interactive init chess duckdb", + "pip install -r requirements.txt", + ], + name="Set up Python environment", + description=dedent( + """\ + Running the following commands: + - `python3 -m venv .env`: Create a Python virtual environment + - `source .env/bin/activate`: Activate the virtual environment + - `pip install dlt`: Install dlt + - `dlt init chess duckdb`: Create a new dlt pipeline called "chess" that loads data into a local DuckDB instance + - `pip install -r requirements.txt`: Install the Python dependencies for the pipeline""" + ), + ) + + +class SwitchDestinationStep(Step): + hide: bool = True + + async def run(self, sdk: ContinueSDK): + # Switch destination from DuckDB to Google BigQuery + filepath = os.path.join(sdk.ide.workspace_directory, "chess_pipeline.py") + await sdk.run_step( + FindAndReplaceStep( + filepath=filepath, + pattern="destination='duckdb'", + replacement="destination='bigquery'", + ) + ) + + # Add BigQuery credentials to your secrets.toml file + template = dedent( + """\ + [destination.bigquery.credentials] + location = "US" # change the location of the data + project_id = "project_id" # please set me up! + private_key = "private_key" # please set me up! + client_email = "client_email" # please set me up!""" + ) + + # wait for user to put API key in secrets.toml + secrets_path = os.path.join(sdk.ide.workspace_directory, ".dlt/secrets.toml") + await sdk.ide.setFileOpen(secrets_path) + await sdk.append_to_file(secrets_path, template) + + # append template to bottom of secrets.toml + await sdk.wait_for_user_confirmation( + "Please add your GCP credentials to `secrets.toml` file and then press `Continue`" + ) + + +class LoadDataStep(Step): + name: str = "Load data to BigQuery" + hide: bool = True + + async def run(self, sdk: ContinueSDK): + # Run the pipeline again to load data to BigQuery + output = await sdk.run( + ".env/bin/python3 chess_pipeline.py", + name="Load data to BigQuery", + description="Running `.env/bin/python3 chess_pipeline.py` to load data to Google BigQuery", + ) + + if "Traceback" in output or "SyntaxError" in output: + with open(find_data_file("dlt_duckdb_to_bigquery_docs.md"), "r") as f: + docs = f.read() + + output = "Traceback" + output.split("Traceback")[-1] + suggestion = await sdk.models.default.complete( + dedent( + f"""\ + When trying to load data into BigQuery, the following error occurred: + + ```ascii + {output} + ``` + + Here is documentation describing common errors and their causes/solutions: + + {docs} + + This is a brief summary of the error followed by a suggestion on how it can be fixed:""" + ) + ) + + sdk.raise_exception( + title="Error while running query", + message=output, + with_step=MessageStep( + name=f"Suggestion to solve error {AI_ASSISTED_STRING}", + message=suggestion, + ), + ) |