Are you buying an apartment? How to hack competition in the real estate market with data monitoring

[This article was first published on r – Appsilon Data Science | We Provide End­ to­ End Data Science Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the last couple of years, real estate companies have shifted their focus to the digital world, and now almost all investments have an online system showing what apartments are available. This is very convenient for their potential clients, as they can easily become familiar with the apartments on offer. Things become interesting when all available data is monitored on a weekly basis, and sales progress is analysed.

Why is this so important? As sales dynamics are crucial figures that are rarely represented as a number, it gives you an edge in negotiations. Knowing the dynamics enables you to better plan your cash flow over time, and increase your asset efficiency in the short term.

This investigation helped me to fight against the odds of the current real estate market in Poland, and close apartment deals on better terms.

How can you benefit as an individual client?

  1. You have an additional edge in price negotiations.
  2. You can choose the right moment to buy.
  3. You can predict the next move of a developer. Was the investment successful? Will they continue to the next phase? What’s their current financial situation?

Also, this approach can be beneficial to a real estate company through monitoring competition?

  1. You have first-hand knowledge about market demand in a given location.
  2. You can recognise patterns in your competitor’s sales process.
  3. You can better understand clients’ needs. Which types of apartments are sold more quickly?

Example results after monitoring one of the locations in Warsaw

The investment is built in the Warsaw eastern district of Praga Południe. This is an area with great potential. Currently, there are a lot of apartment buildings and places that wait for renovation or replacement. However, it is very well connected with the city center (15 minutes by public transport or 10 minutes by car). Another advantage is that there are a lot of points of interests like shopping malls, parks, schools or medical service points.

The monitored investment is built in the eastern Warsaw district of Praga Południe. This is an area with great potential. Currently, there are a lot of apartment buildings and places waiting to be renovated. However, it is very well connected with the city centre (15 minutes by public transport or 10 minutes by car). There are also a lot of facilities nearby, such as shopping malls, parks, schools, or medical service points.

Here are the basic characteristics of the monitored investment:

  • There are 135 apartments for sale.
  • There are 8 floors in the building; each floor has 14 to 16 apartments.
  • Investment starts from an empty plot with construction permission, work on the site started in Q1 2018. Planned date of completion is Q4 2019.
  • The real estate developer began selling apartments in April 2018.

After collecting the first data from the investment website, we can see the current offers of different apartment types:


This is good to know, but the most interesting data comes from the periodical monitoring of sales progress. Below you can see the timeline representing the number of apartments sold each week:

What else can we see? For example, what is the current state of sales progress?

It is worth remembering that construction began in Q1 2018. Currently, only the building’s foundation is ready, but almost half of the apartments are already sold.

Predicted date of selling all apartments

Let’s see what the sales dynamic is by apartment type. We can also try to make a simple prediction of when all apartments will be sold.


What we can see from this visualisation:

  • If the selling pace continues, almost all apartments will be sold by May 2019. This is over 6 months before the planned construction completion date.
  • However, not all apartments have the same sales dynamic. The fastest selling apartments have 1 or 2 rooms. The biggest apartments, with 4 rooms, are less popular – only 3 have sold.

Did potential buyers change their mind? Could they get a mortgage loan?

Another interesting observation is about apartments that had sold, but later became available again. There are multiple reasons why this happens. It is important to understand how the selling process works. The buyer has to sign an agreement with the developer before requesting a loan from a bank. There is always a possibility that after officially signing for an apartment, the bank can reject your loan request. The buyer can also change their mind and withdraw for other reasons.

In the observed data, there was only one situation like this.  A 2-room apartment with a separate kitchen was sold at the end of August 2018, but, in mid-September, became available again.

Data scraping

Gathering valuable data for analysis is the foundation of every data science process. When the data comes from an external online source, one of the methods is data scraping. This technique means that the data is “scraped” from a website by a web crawler (other names are “robot” or “scraper”). The robot parses text into a machine-readable format, so then it can be analysed.

Companies are aware of this process, so they usually protect their websites and online systems. Some of the techniques are:

  • Displaying content in a dynamic way (e.g. using JavaScript) or complicating website structure, so it is harder for a crawler to parse.
  • Protecting content with captcha or other robot-detection tools.
  • Anomaly detection systems that analyse behaviour between requests and ban suspicious visitors.

At Appsilon we know how to deal with all of these obstacles. All of them make web scraping harder, but are not bulletproof.

Collecting the data from the investment page

In the case described in this article, I monitored one of the locations in Warsaw. The real estate developer is selling apartments in this location using an online system, which contains the following table of apartment availability.

This system didn’t have any captcha or anomaly detection protection. The only impediment here is that the content is loaded dynamically using JavaScript, and there is no open API endpoint where we can just request the data. You are able to view the data only after clicking through the interface.

The solution is a web crawler simulating human behaviour, clicking through the interface. Imagine that you have a robot-employee, who monitors all the information you need, on a weekly basis, doesn’t get tired or bored, and is 100% precise.

I used Google’s Puppeteer.js technology; it can replicate a human behaviour in a browser running in the background. Here is the source code of a scraper:


const puppeteer = require('puppeteer');
(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(''); // Real URL address is confidential
  await page.waitFor(2000);

  const nextSelector = '//input[contains(@alt, ">")]'
  var next = await page.$x(nextSelector)

  while (next.length > 0) {
    const cells = await page.$x('//*[@id="gwProducts"]/tbody/tr/td');
    var vals = [];
    var val = null;

    for (var i = 0; i < cells.length; i++) {
      val = await page.evaluate(x => x.textContent, cells[i])
      if (val == "F1") vals.push([]); // The quickest (dirty) way to recognize row beginning.
      if (val == "zobacz" || val.trim() == "") continue; // The quickest (dirty) way to recognize row end.
      vals[vals.length - 1].push(val)

    vals.forEach(row => {
      console.log(row.join(";")) // Robot prints collected rows to the stdout

    await next[0].click()
    await page.waitFor(2000);
    next = await page.$x(nextSelector)

  await page.screenshot({path: 'example.png'});
  await browser.close();


And we can run the crawler with the command below. Just configure a CRON or other job to run this weekly:

node scraper.js > data-YYYY-MM-DD.csv 

This is just one of a wide range of techniques that can be used for scraping. Let us know if you want to know more about collecting data and other business cases from our experience!


As a private investor, I need to excel in my investments. There is no place for bad bets, which is why I use all available cards in the deck to make sure I do everything in my data science power. This analysis gave me the information about the investment’s financial performance, and I could predict how much time I had to close the deal for my dream apartment. I had more time for the final decision, and I was more confident during negotiations. Remember that buying an apartment at the latest possible moment gives you short-term alternatives, instead of freezing your capital right away. I had an edge that other buyers did not, so I could easily tackle false claims about the popularity of the apartments, and not feel pressured into buying. Each sales process is a game with unequal distribution of knowledge. Knowledge of sales dynamics will allow you to fight against the odds and close better deals.

Artykuł Are you buying an apartment? How to hack competition in the real estate market with data monitoring pochodzi z serwisu Appsilon Data Science | We Provide End­ to­ End Data Science Solutions.

To leave a comment for the author, please follow the link and comment on their blog: r – Appsilon Data Science | We Provide End­ to­ End Data Science Solutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)