adding support for Hugging Face Inference Endpoints (#460)

* stream complete sketch * correct structure but issues * refactor: :art: clean up hf_inference_api.py * fix: :bug: quick fix in hf_infrerence_api.py * feat: :memo: update documentation code for hf_inference_api * hf docs * now working --------- Co-authored-by: Nate Sesti <sestinj@gmail.com>
author: Ty Dunn <ty@tydunn.com> 2023-09-07 11:19:38 -0700
committer: GitHub <noreply@github.com> 2023-09-07 11:19:38 -0700
commit: 8dad79af8b18c08e270382ce1b18a3956fa59626 (patch)
tree: 2317e2a5e43624e54e7894e016b9a84a08d1cec9 /docs
parent: 65887473f4c6711d2a64a087f835f86556c75dff (diff)
download: sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.tar.gz
sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.tar.bz2
sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.zip
1 files changed, 19 insertions, 0 deletions
diff --git a/docs/docs/customization.md b/docs/docs/customization.md
index 5fc3eab5..fb7dc0c5 100644
--- a/docs/docs/customization.md
+++ b/docs/docs/customization.md
@@ -21,6 +21,7 @@ Open-Source Models (not local)
 
 - [TogetherLLM](#together) - Use any model from the [Together Models list](https://docs.together.ai/docs/models-inference) with your Together API key.
 - [ReplicateLLM](#replicate) - Use any open-source model from the [Replicate Streaming List](https://replicate.com/collections/streaming-language-models) with your Replicate API key.
+- [HuggingFaceInferenceAPI](#huggingface) - Use any open-source model from the [Hugging Face Inference API](https://huggingface.co/inference-api) with your Hugging Face token.
 
 ## Change the default LLM
 
@@ -206,6 +207,24 @@ config = ContinueConfig(
 
 If you don't specify the `model` parameter, it will default to `replicate/llama-2-70b-chat:58d078176e02c219e11eb4da5a02a7830a283b14cf8f94537af893ccff5ee781`.
 
+### Hugging Face
+
+Hugging Face Inference API is a great option for newly released language models. Sign up for an account and add billing [here](https://huggingface.co/settings/billing), access the Inference Endpoints [here](https://ui.endpoints.huggingface.co), click on “New endpoint”, and fill out the form (e.g. select a model like [WizardCoder-Python-34B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)), and then deploy your model by clicking “Create Endpoint”. Change `~/.continue/config.py` to look like this:
+
+```python
+from continuedev.src.continuedev.core.models import Models
+from continuedev.src.continuedev.libs.llm.hf_inference_api import HuggingFaceInferenceAPI
+
+config = ContinueConfig(
+    ...
+    models=Models(
+        default=HuggingFaceInferenceAPI(
+            endpoint_url: "<INFERENCE_API_ENDPOINT_URL>", 
+            hf_token: "<HUGGING_FACE_TOKEN>",
+    )
+)
+```
+
 ### Self-hosting an open-source model
 
 If you want to self-host on Colab, RunPod, HuggingFace, Haven, or another hosting provider you will need to wire up a new LLM class. It only needs to implement 3 primary methods: `stream_complete`, `complete`, and `stream_chat`, and you can see examples in `continuedev/src/continuedev/libs/llm`.
author	Ty Dunn <ty@tydunn.com>	2023-09-07 11:19:38 -0700
committer	GitHub <noreply@github.com>	2023-09-07 11:19:38 -0700
commit	8dad79af8b18c08e270382ce1b18a3956fa59626 (patch)
tree	2317e2a5e43624e54e7894e016b9a84a08d1cec9 /docs
parent	65887473f4c6711d2a64a087f835f86556c75dff (diff)
download	sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.tar.gz sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.tar.bz2 sncontinue-8dad79af8b18c08e270382ce1b18a3956fa59626.zip