Sudden Deaths in Canada (2019-2022)
Based on the analysis of 535 401 death notices from necrocanada for 2019-2022, no significant surge in sudden deaths was detected in Canada https://archive.ph/ozruH
Table of Contents
Archive link
In case substack keeps removing the images in this document for whatever reasons, this report has been archived: https://archive.ph/ozruH
Introduction
I have heard people claiming that many sudden deaths are occurring around the world and some believe that the number of sudden deaths started increasing since 2021 by a huge factor without providing the total number of sudden deaths of the past recent years. They only list some cases for 2022 which doesn't help much to persuade others that there is an increase of sudden deaths since a cutoff year (whatever that is).
I have seen some reports of people trying to list/count the number of sudden deaths:
some listing them and doing it completely manually
others trying to count them programmatically
I am more interested with the latter group of people because there is many more hard evidence that we can gather if we do it at least semi-automatically. The latter group at least provides numbers instead of just selecting some cases of sudden deaths and listing them as evidence to their audience, e.g. this week has seen more sudden deaths than the previous week (selection bias). Using machines to search for sudden deaths among the huge sea of death notices involves a lot less selection bias than doing it manually. Some of the people using machines to check for sudden deaths claim that there has been a rise of sudden deaths in Canada since 2021 based on data coming from necrocanada, a huge archive of death notices (in English and French) from various funeral homes around the country. They claim that four times more sudden deaths occurred in 2021 than in 2020 based on detecting sudden death cases among the death notices from necrocanada.
However, I tried to reproduce their results for Canada based on the data from necrocanada (only death notices in English) but I couldn't validate their conclusion that there is a huge increase of sudden deaths since 2021 in Canada. Actually, I don't see any anomaly in the deaths happening these past four years (2019-2022) in Canada. Any way I tried to look at the data of deaths in Canada, there is nothing sticking out like a sore thumb in the distribution of death notices that would suggest that something totally unusual is occurring in Canada in terms of sudden deaths or even about other topics analyzed (e.g. the number of death notices mentioning Canadian Cancer Society or Heart and Stroke foundation).
Thus, here is my report about analyzing English death notices from necrocanada for the past four years (2019-2022). I tried to find out if there were unusual patterns among the data that would point to a surge of sudden deaths since 2021 which I didn’t find. At the same time, I also analyzed other topics of interest such as the number of death notices mentioning 'cancer' or 'Heart and Stroke' since the same people claiming that there is an increase of sudden deaths since 2021 also state that heart attacks and cancer are also in the rise. Again I couldn’t find anything unusual with these topics of interest.
Dataset: necrocanada with only death notices in English
Tools: computer programming and lots of coffee
Size of raw dataset of death notices by year
These are the total number of death notices (HTML pages) from necrocanada that I worked with from the beginning before doing any preprocessing:
NOTE: these death notices represent the raw datasets of death notices that are used as a basis for generating the datasets of topics.
Methodology
Step 1: Preprocessing of death notices
Before counting the number of death notices mentioning a certain topic (e.g. cancer or died suddenly), the raw datasets need to be preprocessed in a two-steps procedure.
We are interested in removing duplicates of death notices (preprocessing #1) and those notices with an invalid year of death (preprocessing #2).
NOTE: heuristics is the term used for calling the different preprocessing methods explained in this section because the task of removing completely all duplicated and invalid (wrong dates of death) death notices is an extremely difficult problem to solve completely automatically since a given full name can be formed in so many different ways, a given deceased person could have different death notices with dissimilar text and different years can be found in the text of a given death notice.
Thus different heuristics are applied one after the other to progressively arrive at a cleaner dataset than what we started with.
Preprocessing #1: Remove duplicates
Three heuristics are applied to the death notices to detect those that are duplicates. All of these heuristics make sure to keep the death notice having the longer main content among its duplicates.
For example, the first of these two death notices about the same person will be kept because its text is longer:
https://necrocanada.com/obituaries-2022/alfred-emil-desjarlais-april-7-1947-september-4-2022-age-75/
The first method simply checks if the URL for a given death notice has the keyword ‘-2.html’ or ‘-updated-’ which both suggest that the corresponding notice is a duplicate.
Examples of duplicates detected by the first method:
evelyn-haidee-ogden-nee-watt-august-27-1931-december-8-2022-91-years-old.html
evelyn-haidee-ogden-nee-watt-august-27-1931-december-8-2022-91-years-old-2.html
william-stanley-burgess-19462022.html
william-stanley-burgess-updated-19462022.html
Before applying the second method, you need first to extract from a given URL of a death notice the date of birth and/or death using multiple different regex applied on the URL of the HTML document.
We need at least 10 regex to detect sufficient number of ways that the URL could be formed. Here is a sample of the patterns that need to be detected on the URL of a given death notice:
fullname-month-day-year-month-day-year
e.g. andrew-william-dumont-april-5-1933-december-30-2020-age-87.html
fullname-nameofday-month-positionofday-year
e.g. dann-lynn-dabels-sunday-december-20th-2020.html
nom-jour-mois-annee-jour-mois-annee
e.g. finneas-daniel-haughian-forbes-14-octobre-2004-4-octobre-2022.html
This pattern is for those that are interested in searching French death notices.
The second method is only based on death notices whose URL has the full two dates: date of birth (DOB) and date of death (DOD), e.g. andrew-william-dumont-april-5-1933-december-30-2020-age-87 or joan-smith-february-19-1940-to-december-13-2022.
It checks if the DOB and DOD for a given death notice were already found from another death notice based on the corresponding URL. If it is the case, then it is very likely that we found a duplicate because it is very rare that two people born on the same day also died on the same day.
Examples of duplicates detected by the second method:
https://necrocanada.com/obituaries-2022/alfred-emil-desjarlais-april-7-1947-september-4-2022-age-75/
https://necrocanada.com/obituaries-2022/arnold-anthony-hendel-may-28-1931-may-1-2022-age-90/
https://necrocanada.com/obituaries-2022/arnold-anthony-heindel-may-28-1931-may-1-2022-age-90/
The third method reads the first 60% of the main part of the death notice and checks if the hash of the selected text was already computed before. If it is the case then we have found a potential duplicate. I said potential because there is the likelihood that two death notices from different persons start with the same text (e.g. general directives from a funeral home might take more than the first 20% of the death notice).
Examples of duplicates detected by the third method:
https://necrocanada.com/obituaries-2022/johannegauthier-le-lundi-2-fevrier-1959-le-mardi-11-janvier-2022/
https://necrocanada.com/obituaries-2022/johannegauthier-le-mardi-24-fevrier-1959-le-mardi-11-janvier-2022/
https://necrocanada.com/obituaries-2022/jeannearsenault-leblanc-2022/
Finally, the duplicates found from methods 2 and 3 can be manually checked to make sure they are actually duplicates and not wrongly flagged as such.
Another method tested but not selected (it doesn't scale well with lots of data) is to split the full name of a deceased into its parts and check each part if it can be found into another full name along with the exact dates of birth and death. Also, it makes sure to reject parts that are entirely composed of digits or not bigger than one character (e.g. the letter 'p' could be one of these parts if the full name is for instance john-p-francis).
Examples of duplicates detected by this method:
james-jim-paul-july-27-1970november-20-2022.html
james-jim-paul-wilson-july-27-1970november-20-2022.html
beaton-walsh-thursday-february-24th-2022.html
dr-beaton-j-j-walsh-thursday-february-24th-2022.html
johndavidpoirier-new-glasgow-new-victoria-1950-2022.html
davepoirier-new-glasgow-new-victoria-1950-2022.html
joe-david-whitall-wednesday-july-13th-2022.html
joseph-david-whitall-wednesday-july-13th-2022.html
Preprocessing #2: Remove death notices with invalid year of death
The second and last preprocessing step is about removing any death notice with an invalid year of death. Thus a death notice from a given year should have a year of death corresponding to said year.
The first method simply checks a given URL of a death notice containing an invalid year of death, e.g. suzanne-moncton-2000-2020.html should be rejected if the year you are analyzing is 2021 since that person died in 2020.
Examples of death notices with invalid dates of death because they are not from 2022 (based on this first method):
https://necrocanada.com/obituaries-2022/carolyn-marie-drinkwater-oneill-october-21-1945-december-21-2019-age-74/
https://necrocanada.com/obituaries-2022/henryloewen-1932-2020/
https://necrocanada.com/obituaries-2022/donald-frank-loro-june-8-1945-june-30-2020-age-75/
https://necrocanada.com/obituaries-2022/ashleywade-1991-2021/
The second method checks if an invalid year of death appears in the first 20% of the main part of the death notice. If it is the case, then that death notice will be rejected.
The reason for focusing in the first 20% of a death notice when detecting an invalid year of death is that usually it is at the start of a death notice that the year of death is mentioned. Different years could be mentioned in other parts of the text which could give lots of death notices wrongly rejected if we read too far into a death notice, e.g. a death notice from 2022 could mention in the middle of it that the deceased's brother died in 2020.
Examples of death notices with invalid dates of death because they are not from 2022 (based on this second method):
https://necrocanada.com/obituaries-2022/rogers-lily-christine-december-04-1936/
https://necrocanada.com/obituaries-2022/gillert-william-morgan-june-23-1982/
https://necrocanada.com/obituaries-2022/rodzinyak-kenneth-james-june-17-1961/
Finally, another manual check can be done to determine that only truly invalid death notices were rejected by these heuristics.
Size of preprocessed dataset of death notices by year
These are the total number of death notices (HTML pages) from necrocanada after applying the previous preprocessing:
NOTE: these death notices represent the preprocessed datasets that are used as a basis for counting the topics found in the death notices by year.
Step 2: Count topics from preprocessed datasets of death notices
These are the topics that were searched on the death notices from necrocanada:
cancer
Canadian Cancer Society
Cancer Foundation
Heart and Stroke
Heart & Stroke
Gofundme
died suddenly
passed away suddenly
sudden passing
Thus, the HTML page for a given death notice was searched for any mention of these topics. The search was limited to the main part of the death notice. If the topic being searched is related to sudden deaths, the search is further limited to the first 30% of the main content. The reason is that 'passed away suddenly' and other related topics are usually mentioned at the very start of a death notice. Hence, we lower the likelihood of including cases where they are talking of someone else dying suddenly other than the actual deceased person (e.g. the deceased's husband/wife).
At the end of step 2, a dataset of topics is generated for each analyzed year and containing the following information:
For a given topic, the number of death notices mentioning it for a given year
For a given topic, the file path of the HTML page where said topic was found is also recorded
Results
Results without any preprocessing
If we don't do any kind of preprocessing to the death notices and we take a "brute search" approach by searching for topics over the raw datasets, we arrive at the results shown in the following table. That is, we don't care about duplicates and death notices with invalid years of death. Surprisingly you will see that the relative numbers (%) we arrive at are similar to the case where we apply the preprocessing because not many bad apples are enough to spoil the huge basket of death notices: the law of large numbers protects us against invalid death notices creeping into our dataset. However, if our dataset were to be too small (maybe less than 10% of its size) and/or there were too many invalid death notices, we would feel the negative effects of these bad cases when we don't apply the correct preprocessing.
Thus the following table presents the number of death notices where a certain topic is found for a given year. Two numbers are given for each topic and year: absolute and relative numbers (within square brackets). Thus for the topic 'cancer' and year 2020, we have 16 253 death notices in 2020 where 'cancer' is found which represents 12.05% of all death notices for that year.
In the case for 'Heart & Stroke', the numbers are too small that we can practically say that they represent 0% of all death notices, i.e. not too many people write the name of this foundation that way and as you can see from the table, this is what we find out for the past four years like clockwork.
NOTE: It is important to look at relative numbers because each year people might not be writing death notices at the same rate as other past years. Thus, it is important to look at how many death notices mention a certain topic in proportion to all death notices for a given year instead of focusing only on the absolute numbers which might give you a skewed view of the situation.
Results with preprocessing
After applying the preprocessing to the raw datasets of death notices, we count the number of death notices that a certain topic is found for a given year.
The following table gives the absolute and relative numbers (within square brackets) for each topic and for a given year.
Thus for the topic 'cancer' and year 2020, we have 16 005 death notices in 2020 where 'cancer' is found which represents 12.07% of all death notices for that year.
NOTE: check also the size of the preprocessed datasets of death notices upon which this table was computed.
Conclusion
As we can see from the table showing the number of death notices where a topic of interest is found for a given year, nothing unusual is standing out from this table that would signal something out of the ordinary occurring since 2021 or any other year analyzed. When analyzing the topics found in the death notices, we don't find anything atypical happening for the past four years (2019-2022). Everything seems to be repeating each year in regards to the number of death notices in Canada mentioning a certain topic of interest.
I tried to validate reports claiming that there have been four times more sudden deaths in 2021 than in 2020 based on identifying specific keywords about dying suddenly in death notices from necrocanada. However, I didn’t find this incredible surge of sudden deaths based on searching for specific keywords (e.g. ‘died suddenly’ or ‘passed away suddenly’) in the death notices from necrocanada for the past four years (2019-2022). The numbers don’t even suggest an increase of twice the cases from 2020 to 2021 for death notices mentioning topics related to dying suddenly. For instance, the increase for the relative number of death notices mentioning ‘sudden passing’ from 2020 to 2021 is so insignificant (going from 1.73% to 2.0%) that it is not worthwhile to talk about or sound the alarm.
The results obtained semi-automatically by certain people suggesting that a surge in sudden deaths occurred since 2021 (based on death notices from necrocanada) could not be reproduced here. It seems that their search method (which is difficult to implement if not many details are given) found a lot more sudden deaths in 2021-2022 than in previous years because the selection bias of their search method was too strong that they arrived at a questionable conclusion by ignoring a huge chunk of sudden deaths before 2021-2022. They believed that sudden deaths didn't really happen in great numbers before these supposedly cutoff years because their flawed methodology suggested that interpretation of the situation which is not actually the case when we analyze the dataset of death notices from necrocanada. I didn't see any significant surge of sudden deaths happening in the past years. Actually, the number of sudden deaths is following a stable trend these past years as can be seen from the previous table.
Some might think that you can query your way out of this problem of searching sudden deaths by relying on google search engine and providing it with the right keywords and date limits. Google search will not give you an extremely accurate portrait of the actual content of a website, particularly if you are querying for very far away years and for not very popular websites. Google search will not dedicate their powerful computer processing in finding all your needles in a haystack. Though, they will give good enough results to satisfy as many people as possible.
Thus beware of the selection bias creeping into your search method.
Q and A
I will just answer the best questions that I have received about the results from this report.
Why should I trust you?
antiauthority1984
asks a very penetrating question. I am still kind of surprise though that someone who is anti-establishment is asking for reasons to trust a stranger on the internet. “Don’t trust the government” is not one of your favorite expression? Just like you claim to not trust authorities, you should not also put blind faith on data just because it goes against the narrative.
Only once did I use the word “trust” in this report (other than here) and it was about trusting that people are smart enough to decipher my email address. On the contrary, you should be extremely skeptical of any data no matter where it comes from. Always do your own work to validate other people’s conclusions.
Only by reproducing someone’s else results can you be confident that those results can be worthwhile to explore further.
What is your real name and why are you anonymous?
anonymous34029
asks a very intriguing question since I thought that there was a common understanding among anonymous users on the internet for why someone would personally choose to be anonymous. Also anonymous34029
does this profound questioning while supporting anonymous professionals that write articles that go against the main narrative.
I will provide my personal details once you provide me your home address and then I will send them to you as soon as possible.
Why did you only get data from 2019-2022? Why not from 1990? or 212 BC?
ilovebirds09
asks three very important questions. As stated in the introduction of this report, I was mainly interested in trying to validate the results of some reports that were claiming that a huge increase of sudden deaths started to happen in 2021 because they believed something happened in 2021 on a major scale that could explain these sudden deaths. However as shown in this report, no significant surge in sudden deaths was detected in Canada based on analyzing 535 401 death notices from necrocanada for 2019-2022 (since this was the way these reports based their claim).
Also for those who don’t know much data analysis, you can’t just press a magic button that will get you instantaneously all the preprocessed data ready to be used for your number crunching. For instance, to get the whole raw dataset of four years of death notices (more than half a million of HTML documents), more than two weeks were necessary to get it (there is a major reason for that and it doesn’t have to do with technology but I won’t get into the details, I think you are smart enough to read between the lines) . And this is without considering the preprocessing stage as explained in this document and other unforeseen events that can derail your project for a couple of days (e.g. having an old and slow computer does no wonders when processing huge quantities of data).
Finally necrocanada doesn’t provide lots of data before 2018. There are 212 724 death notices from 2013-2017 (about 42 545 death notices per year on average) which is not a lot when compared to those from 2018-2022 (each of these years have more than 120k death notices).
I guess that I still could re-do the whole analysis for the remaining years (2013-2018) but I am tired and fed up with this project. I just want to move on. However, this could be a good opportunity for you to learn data analysis and have a deep appreciation for what it entails to process huge amounts of data to get to the bottom of things. Maybe you could uncover deep answers about the universe in those remaining years? What do you think? Do you want to take over from here?
But thank you anyway for your three great questions ilovebirds09
You don’t see lots of sudden death cases in necrocanada because the words ‘died suddenly’ were removed and you can only see them in the local obituaries. Have you thought about that?
i_only_trust_evidence451
raised a good point that only someone with a deep appreciation for evidence could think of. However, when asked to provide a long list of death notices (with URLs pointing to local obituaries) that had these keywords removed from them, total silence. I can only work with concrete evidence, not speculation.
Also, as stated in this report that obviously they didn’t read because they prefer to speculate all day long than face reality, I also checked for other keywords (e.g. ‘cancer’, ‘sudden passing’, ‘Heart and Stroke’) in the 535 401 death notices from necrocanada. I didn’t find anything unusual in these death notices from 2019-2022. That means that i_only_trust_evidence451
et al. believe that powerful entities are not only removing the words ‘died suddenly’ from death notices but also a long list of keywords and doing it in a way that will constantly produce similar distributions of occurrences of these topics across the years. Believe whatever you want without proof but I rather work with concrete evidence.
Contact
To contact me for any question about this report (or you need to talk to someone): avmloe193 at proton dot me
I “trust” that you are able to get back the correct email address in its original form.