Web scraping is a powerful tool that can be used to collect and analyze real-time housing data. However, there are several legal and technical concerns when it comes to web scraping housing data. This panel will provide perspectives on ongoing web scraping initiatives, lessons learned and various web scraping applications.
This webinar is being presented as part of the Community Data Program's Annual Meeting and is open to all CDP members.
Date: Friday, May 12, 2023, 10:40 am to 11:40 am Eastern Time
Number of registrants: 32
Number of participants: 20 online, 21 in-person
About the presenter:
Shirin Roshanafshar, Statistics Canada
Shirin Roshanafshar is a Manager in the Data Science Division at Statistics Canada. She has a bachelor and a master’s degree in Statistics and has been working at StatsCan for over 17 years. Shirin joined the Data Science Division over 3 years ago and has been managing the Text Analytics and Digitalization Section.
With the exponentially increasing amount of unstructured text data, it’s becoming more difficult to extract valuable information and insights from data. The Text Analytics and Digitalization section leads Machine Learning research and development in the area of text analytics; focusing on innovative, state-of-the-art solutions. The section’s core areas of expertise include: Natural Language Processing/Generation, deep learning and neural networks, topic modeling, information retrieval/extraction, text summarization, classification, unsupervised learning, named entity recognition, conversational agents/chatbots, sentiment analysis, web scraping and data visualization.
Irena Pozgaj Jones, Georgian College (Pronunciation: eye-REE-nah Pōze-guy Jo-owns)
As a Social Impact Fellow with the Centre for Changemaking and Social Innovation at GeorgianCollege, Irena supports the evaluation design of community-based research projects and engagement of student and community stakeholders in the research. Irena brings over 18 years of expertise leading social and community services research projects within municipal government.
A life-longer learner, Irena has an Environmental Engineering Technology diploma, a Bachelor of Arts degree, a Research Analyst post-graduate diploma, and a post-graduate certificate in Community Economic Development.
Wyatt Tensuda, Community Data Program
Wyatt Tensuda is currently earning his Masters of Science in Analytics from The Georgia Institute of Technology and previously majored in Physics at the University of Ontario Institute of Technology. Wyatt is passionate about learning and problem solving, holding the core belief that there is always more to learn in life and that the attainment of knowledge results in greater happiness. He is a self taught programmer who has experience mainly with Python but also HTML, CSS, Javascript, and R. In 2022, Wyatt worked with Creative Neighbourhoods and the Community Data Program to programmatically collect housing data using a process called "web scraping" and will be presenting his insights from this experience.
Downloads:
Recording: