- Recipes
- Zyte automatic extraction to Amazon S3 CSV file
Connect Zyte automatic extraction and Amazon S3 CSV file in our serverless environment
Use this template to Zyte automatic extraction items using them to create CSV file entries in Amazon S3 Bucket.
Share
Zyte automatic extraction items
Used integrations:
- JavaScript
- Python
class HttpSourceZyteAutomaticExtraction {
async init() {
// TODO: Create your http credential with zyte information:
// More info at https://yepcode.io/docs/integrations/http/#credential-configuration
// Get your Zyte API key following guide at: https://docs.zyte.com/automatic-extraction-get-started.html
// baseUrl: https://autoextract.scrapinghub.com/v1
// user: <your_automatic_extraction_api_key>
// password: empty
this.httpClient = yepcode.integration.http(
"your-http-zyte-credential-name"
);
}
async fetch(publish, done) {
// TODO: Customize your request checking the API documentation:
// https://docs.zyte.com/automatic-extraction.html
const payload = [
// In this sample we'll use the product list extraction: https://docs.zyte.com/automatic-extraction/product-list.html
// TODO: Other lists or single object extractions may be done changing this payload:
// articles, comments, job posting, real state, reviews
{
url: "http://books.toscrape.com/",
pageType: "productList",
},
];
const {
data
} = await this.httpClient.post("/extract", payload);
if (data && data[0]) {
// TODO: If retrieved list is not a productList, this attribute navigations should be changed
// TODO: You may also keep on retrieving next paginations using information at data[0].productList.paginationNext
for (const item of data[0].productList.products) {
await publish(item);
}
}
done();
}
async close() {}
}
class HttpSourceZyteAutomaticExtraction:
def setup(self):
# TODO: Create your http credential with zyte information:
# More info at https://yepcode.io/docs/integrations/http/#credential-configuration
# Get your Zyte API key following guide at: https://docs.zyte.com/automatic-extraction-get-started.html
# baseUrl: https://autoextract.scrapinghub.com/v1
# user: <your_automatic_extraction_api_key>
# password: empty
self.session = yepcode.integration.http("your-http-zyte-credential-name")
def generator(self):
# TODO: Customize your request checking the API documentation:
# https://docs.zyte.com/automatic-extraction.html
# In this sample we'll use the product list extraction: https://docs.zyte.com/automatic-extraction/product-list.html
# TODO: Other lists or single object extractions may be done changing this payload:
# articles, comments, job posting, real state, reviews
response = self.session.post(
"/extract",
json={"url": "http://books.toscrape.com/", "pageType": "productList"},
)
response.raise_for_status()
# TODO: If retrieved list is not a productList, this attribute navigations should be changed
# TODO: You may also keep on retrieving next paginations using information at data[0].productList.paginationNext
products = response.json()[0]["productList"]["products"]
for product in products:
yield product
def close(self):
pass
Do you need help solving this integration with YepCode?
Let's talkCreate CSV file entries in Amazon S3 Bucket
Used integrations:
- JavaScript
- Python
class AwsS3TargetUploadCsv {
async init() {
// TODO: Create your aws-s3 credential
// More info at https://yepcode.io/docs/integrations/aws-s3/#credential-configuration
this.awsS3 = yepcode.integration.awsS3("your-aws-s3-credential-name");
// Transforms the items into a csv format
this.stringifier = csv.stringify({
delimiter: ",",
});
this.targetStream = new PassThrough();
this.stringifier.pipe(this.targetStream);
// TODO: customize the Upload content
// More info at: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/interfaces/_aws_sdk_lib_storage.options-1.html
this.upload = new Upload({
client: this.awsS3,
params: {
Bucket: "your-bucket-name",
Key: "your-file-name.csv",
Body: this.targetStream,
},
});
this.upload.on("httpUploadProgress", (progress) => {
console.log(`Upload progress`, progress);
});
this.uploadPromise = this.upload.done();
}
async consume(item) {
// TODO: customize the csv row to create from your item content
const csvRow = [item.value, item.text];
this.stringifier.write(csvRow);
}
async close() {
try {
this.stringifier.end();
} catch (error) {
console.error(`Error ending stringifier`, error);
}
try {
await this.uploadPromise;
} catch (error) {
console.error(`Error ending upload`, error);
}
}
}
import csv
import io
class AccumulatingStream:
def __init__(self):
self.data = io.BytesIO()
def write(self, item):
self.data.write(item.encode("utf-8"))
def get_stream(self):
self.data.seek(0)
return self.data
class AwsS3TargetUploadCsv:
def setup(self):
# TODO: Create your S3 credential:
# More info at https://yepcode.io/docs/integrations/aws-s3/#credential-configuration
self.aws_s3_client = yepcode.integration.awsS3("your-s3-credential-name")
self.acc_stream = AccumulatingStream()
self.stringifier = csv.writer(self.acc_stream, delimiter=",")
def consume(self, generator, done):
for item in generator:
# TODO: customize the csv row to create from your item content
csv_row = [item["value"], item["text"]]
self.stringifier.writerow(csv_row)
done()
def close(self):
# TODO: customize the bucket name and object key
try:
self.aws_s3_client.upload_fileobj(
self.acc_stream.get_stream(),
"bucket-name",
"path/to/object.csv",
)
except Exception as error:
print(f"Error uploading object: {error}")
FAQs
YepCode is a SaaS platform that enables the creation, execution and monitoring of integrations and automations using source code in a serverless environment.
We like to call it the Zapier for developers, since we bring all the agility and benefits of NoCode tools (avoid server provisioning, environment configuration, deployments,...), but with all the power of being able to use a programming language like JavaScript or Python.
These recipes are an excellent starting point for creating your own YepCode processes and solving complex integration and automation problems.
You only have to complete the sign up form and your account will be created with our FREE plan (no credit card required).
YepCode has been created with a clear enterprise focus, offering a multi-tenant environment, team management capabilities, high security and auditing standards, Identity Provider (IdP) integrations, and on-premise options. It serves as the Swiss army knife for engineering teams, especially those requiring the extraction or transmission of information to external systems. It excels in scenarios demanding flexibility and adaptability to change within the process.
Sure! You only need to configure YepCode servers to establish a connection with that service. Check our docs page to get more information.