diff options
author | Ty Dunn <ty@tydunn.com> | 2023-09-07 11:19:38 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-09-07 11:19:38 -0700 |
commit | 8dad79af8b18c08e270382ce1b18a3956fa59626 (patch) | |
tree | 2317e2a5e43624e54e7894e016b9a84a08d1cec9 /docs | |
parent | 65887473f4c6711d2a64a087f835f86556c75dff (diff) | |
download | sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.tar.gz sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.tar.bz2 sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.zip |
adding support for Hugging Face Inference Endpoints (#460)
* stream complete sketch
* correct structure but issues
* refactor: :art: clean up hf_inference_api.py
* fix: :bug: quick fix in hf_infrerence_api.py
* feat: :memo: update documentation code for hf_inference_api
* hf docs
* now working
---------
Co-authored-by: Nate Sesti <sestinj@gmail.com>
Diffstat (limited to 'docs')
-rw-r--r-- | docs/docs/customization.md | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/docs/docs/customization.md b/docs/docs/customization.md index 5fc3eab5..fb7dc0c5 100644 --- a/docs/docs/customization.md +++ b/docs/docs/customization.md @@ -21,6 +21,7 @@ Open-Source Models (not local) - [TogetherLLM](#together) - Use any model from the [Together Models list](https://docs.together.ai/docs/models-inference) with your Together API key. - [ReplicateLLM](#replicate) - Use any open-source model from the [Replicate Streaming List](https://replicate.com/collections/streaming-language-models) with your Replicate API key. +- [HuggingFaceInferenceAPI](#huggingface) - Use any open-source model from the [Hugging Face Inference API](https://huggingface.co/inference-api) with your Hugging Face token. ## Change the default LLM @@ -206,6 +207,24 @@ config = ContinueConfig( If you don't specify the `model` parameter, it will default to `replicate/llama-2-70b-chat:58d078176e02c219e11eb4da5a02a7830a283b14cf8f94537af893ccff5ee781`. +### Hugging Face + +Hugging Face Inference API is a great option for newly released language models. Sign up for an account and add billing [here](https://huggingface.co/settings/billing), access the Inference Endpoints [here](https://ui.endpoints.huggingface.co), click on “New endpoint”, and fill out the form (e.g. select a model like [WizardCoder-Python-34B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)), and then deploy your model by clicking “Create Endpoint”. Change `~/.continue/config.py` to look like this: + +```python +from continuedev.src.continuedev.core.models import Models +from continuedev.src.continuedev.libs.llm.hf_inference_api import HuggingFaceInferenceAPI + +config = ContinueConfig( + ... + models=Models( + default=HuggingFaceInferenceAPI( + endpoint_url: "<INFERENCE_API_ENDPOINT_URL>", + hf_token: "<HUGGING_FACE_TOKEN>", + ) +) +``` + ### Self-hosting an open-source model If you want to self-host on Colab, RunPod, HuggingFace, Haven, or another hosting provider you will need to wire up a new LLM class. It only needs to implement 3 primary methods: `stream_complete`, `complete`, and `stream_chat`, and you can see examples in `continuedev/src/continuedev/libs/llm`. |