Inside ConveyThis Tech: Building Our Website Crawler
Improving the User Experience: ConveyThis Introduces URL Management
Numerous ConveyThis patrons prefer having all their website’s URLs properly translated, which can be a demanding task, especially for expansive sites translated into several languages.
User feedback has shown that some clients found the commencement of their initial website translation projects somewhat bewildering. They often questioned why they could only view the homepage URL in the translation list, and how to create translations of their content.
This indicated a potential area for enhancement. We saw an opportunity to facilitate a smoother onboarding process and more efficient project management. However, we lacked a concrete solution at that moment.
The result, as you might have surmised, was the introduction of the URL Management feature. It enables users to scan their website’s URLs and generate their translated content via the ConveyThis Dashboard, swiftly and effectively.
Recently, this feature was relocated from the Translation List to a new, more adaptable and powerful URL-based translation management page. Now, we believe it’s time to reveal the story behind this feature’s inception.
Embracing Golang: ConveyThis' Journey Towards Enhanced Translation Services
The onset of the 2020 lockdown due to the pandemic offered me the chance to finally learn the programming language Golang that had been sidelined owing to time constraints.
Developed by Google, Golang or Go has been gaining popularity in recent years. A statically compiled programming language, Golang was designed to enable developers to craft efficient, reliable, and concurrent code. Its simplicity supports the writing and maintaining of extensive and complicated programs without sacrificing speed.
In pondering a potential side project to familiarize myself with Golang, a web crawler sprang to mind. It met the criteria mentioned and potentially offered a solution for ConveyThis users. A web crawler or ‘bot’ is a program that visits a website to extract data.
For ConveyThis, our aim was to develop a tool for users to scan their site and retrieve all the URLs. Additionally, we wanted to streamline the process of generating translations. Currently, users must visit their website in a translated language to generate them, a task that becomes daunting for large, multi-language sites.
Although the initial prototype was straightforward – a program that takes a URL as input and starts crawling the site – it was quick and effective. Alex, ConveyThis’ CTO, saw the potential of this solution and gave the go-ahead for research and development to refine the concept and contemplate how to host the future production service.
Navigating the Serverless Trend with Go and ConveyThis
In the process of finalizing the web crawler bot, we found ourselves grappling with the nuances of different CMS and integrations. The question then arose – how can we best present our users with the bot?
Initially, we considered the tried and tested approach of using AWS with a web server interface. However, several potential issues emerged. We had uncertainty about the server load, the simultaneous use by multiple users, and our lack of experience with Go program hosting.
This led us to consider a serverless hosting scenario. This offered benefits such as infrastructure management by the provider and inherent scalability, making it an ideal solution for ConveyThis. It meant we didn’t have to worry about server capacity since each request would operate in its own isolated container.
However, back in 2020, serverless computing came with a 5-minute limit. This proved a problem for our bot which could potentially be required to crawl large e-commerce sites with numerous pages. Fortunately, in early 2020, AWS extended the limit to 15 minutes, although enabling this feature proved to be a challenging task. Eventually, we found the solution by triggering the serverless code with SQS – the AWS message queuing service.
The Journey to Interactive Real-Time Bot Communications with ConveyThis
As we resolved the hosting dilemma, we had another hurdle to overcome. We now had a functional bot, hosted in an efficient, scalable manner. The remaining task was to relay the bot-generated data to our users.
Aiming for maximum interactivity, I decided on real-time communication between the bot and the ConveyThis dashboard. While real-time isn’t a requirement for such a feature, I wanted our users to get immediate feedback as soon as the bot started working.
To achieve this, we developed a simple Node.js websocket server, hosted on an AWS EC2 instance. This required some tweaks to the bot for communication with the websocket server and automating deployment. After thorough testing, we were ready to transition to production.
What started as a side project ultimately found its place in the dashboard. Through the challenges, I gained knowledge in Go and honed my skills in the AWS environment. I found Go particularly beneficial for networking tasks, cooperative programming, and serverless computing, given its low memory footprint.
We have future plans as the bot brings new opportunities. We aim to rewrite our word count tool for better efficiency, and potentially use it for cache warming. I hope you enjoyed this sneak peek into ConveyThis’s tech world as much as I have enjoyed sharing it.
Ready to get started?
Translation, far more than just knowing languages, is a complex process.
By following our tips and using ConveyThis, your translated pages will resonate with your audience, feeling native to the target language.
While it demands effort, the result is rewarding. If you’re translating a website, ConveyThis can save you hours with automated machine translation.
Try ConveyThis free for 7 days!