- Recipes
- Zyte automatic extraction to Tinybird
Connect Zyte automatic extraction and Tinybird in our serverless environment
Use this template to Zyte automatic extraction items using them to append CSV entries to Tinybird datasource.
Share
Zyte automatic extraction items
Used integrations:
JavaScript
Python
class HttpSourceZyteAutomaticExtraction {
async init() {
// TODO: Create your http credential with zyte information:
// More info at https://yepcode.io/docs/integrations/http/#credential-configuration
// Get your Zyte API key following guide at: https://docs.zyte.com/automatic-extraction-get-started.html
// baseUrl: https://autoextract.scrapinghub.com/v1
// user: <your_automatic_extraction_api_key>
// password: empty
this.axiosClient = yepcode.integration.http(
"your-http-zyte-credential-name"
);
}
async fetch(publish, done) {
// TODO: Customize your request checking the API documentation:
// https://docs.zyte.com/automatic-extraction.html
const payload = [
// In this sample we'll use the product list extraction: https://docs.zyte.com/automatic-extraction/product-list.html
// TODO: Other lists or single object extractions may be done changing this payload:
// articles, comments, job posting, real state, reviews
{
url: "http://books.toscrape.com/",
pageType: "productList",
},
];
const {
data
} = await this.axiosClient.post("/extract", payload);
if (data && data[0]) {
// TODO: If retrieved list is not a productList, this attribute navigations should be changed
// TODO: You may also keep on retrieving next paginations using information at data[0].productList.paginationNext
for (const item of data[0].productList.products) {
await publish(item);
}
}
done();
}
async close() {}
}
class HttpSourceZyteAutomaticExtraction:
def setup(self):
# TODO: Create your http credential with zyte information:
# More info at https://yepcode.io/docs/integrations/http/#credential-configuration
# Get your Zyte API key following guide at: https://docs.zyte.com/automatic-extraction-get-started.html
# baseUrl: https://autoextract.scrapinghub.com/v1
# user: <your_automatic_extraction_api_key>
# password: empty
self.session = yepcode.integration.http("your-http-zyte-credential-name")
def generator(self):
# TODO: Customize your request checking the API documentation:
# https://docs.zyte.com/automatic-extraction.html
# In this sample we'll use the product list extraction: https://docs.zyte.com/automatic-extraction/product-list.html
# TODO: Other lists or single object extractions may be done changing this payload:
# articles, comments, job posting, real state, reviews
response = self.session.post(
"/extract",
json={"url": "http://books.toscrape.com/", "pageType": "productList"},
)
response.raise_for_status()
# TODO: If retrieved list is not a productList, this attribute navigations should be changed
# TODO: You may also keep on retrieving next paginations using information at data[0].productList.paginationNext
products = response.json()[0]["productList"]["products"]
for product in products:
yield product
def close(self):
pass
Do you need help solving this integration with YepCode?
Let's talkAppend CSV entries to Tinybird datasource
Used integrations:
JavaScript
Python
class HttpTargetTinybirdAppendEndpoint {
async init() {
// TODO: Create your http credential with tynibird information:
// More info at https://yepcode.io/docs/integrations/http/#credential-configuration
// baseUrl: https://api.tinybird.co/v0
// HTTP Headers: { "Authorization": "Bearer your-read-token" }
this.axiosClient = yepcode.integration.http(
"your-http-tinybird-credential-name"
);
// Transforms the items into a csv format
this.stringifier = csv.stringify({
delimiter: ",",
});
const targetStream = new PassThrough();
this.stringifier.pipe(targetStream);
// Append the stream to Tinybird endpoint
const formData = FormData();
formData.append("csv", targetStream);
// TODO: Customize your request checking the API documentation:
// https://www.tinybird.co/docs/api-reference/datasource-api.html
this.tinybirdPost = this.axiosClient.post(
`/datasources?format=csv&mode=append&name=your_end_point_name`,
formData, {
headers: {
...formData.getHeaders(),
},
}
);
}
async consume(item) {
// TODO: Map item to your CSV format
const csvRow = [item.col1, item.col2, item.col3];
this.stringifier.write(csvRow);
}
async close() {
this.stringifier.end();
await this.tinybirdPost;
}
}
import pandas as pd
class HttpTargetTinybirdAppendEndpoint:
def setup(self):
# TODO: Create your http credential with tynibird information:
# More info at https://yepcode.io/docs/integrations/http/#credential-configuration
# baseUrl: https://api.tinybird.co/v0
# HTTP Headers: { "Authorization": "Bearer your-read-token" }
self.session = yepcode.integration.http("your-http-tinybird-credential-name")
self.df = pd.DataFrame()
def consume(self, generator, done):
for item in generator:
self.process(item)
done()
def process(self, item):
# TODO: Map item to your CSV format
csv_row = [[item["col1"], item["col2"], item["col3"]]]
self.df = self.df.append(csv_row, ignore_index=True)
def close(self):
payload = {"csv": (None, self.df.to_csv(index=False))}
# TODO: Customize your endpoint name checking the API documentation
# https://www.tinybird.co/docs/api-reference/datasource-api.html
# Using the files attribute to send the payload with content-type multipart/form-data
response = self.session.post(
"/datasources",
params={"name": "your-endpoint-name", "format": "csv", "mode": "append"},
files=payload,
)
response.raise_for_status()
FAQs
YepCode is a SaaS platform that allows to create, execute and monitor integrations and automations using source code in a serverless environment.
We like to call it the Zapier for developers, since we bring all the agility and benefits of NoCode tools (avoid server provisioning, environment configuration, deployments,...), but with all the power of being able to use a programming language like JavaScript or Python.
These recipes are a good starting point for you to build your own YepCode processes and solve your integration and automation problems.
You only have to fill the sign up form and your account will be created with our FREE plan (no credit card required).
YepCode has been created with a clear enterprise approach (multi-tenant environment, team management, high security and auditing standards, IdP integrations, on-premise options,...) so we can be the Swiss army knife of any team of engineering, especially those that need to extract or send information to external systems, and where a certain dynamism or adaptation to change is necessary in that process.
Sure! You just need to do some configuration to allow YepCode servers to connect to that service. Check our docs page to get more information.