{"slug": "estimating-open-source-project-lifecycle-using-the-bass-model", "title": "Estimating Open source project lifecycle using the Bass model", "summary": "Researchers applied the Bass diffusion model to estimate the lifecycle of open-source projects by fitting monthly developer contributions and extrapolating future activity. The model's use of the hyperbolic secant function parallels the Parr model for project staffing, suggesting a potential connection between developer adoption and problem-solving dynamics. The study analyzed data from projects running for at least a year to predict the duration of major active development.", "body_md": "[Home](https://shape-of-code.com/)>\n\n[Uncategorized](https://shape-of-code.com/category/uncategorized/)> Estimating Open source project lifecycle using the Bass model\n\n## Estimating Open source project lifecycle using the Bass model\n\nIs it possible to reliably estimate the elapsed time that a multi-person Open source project spends under major active development, once it has been running for a year or so, and attracted some developers?\n\nThe paper [Project Life Cycles in Open-Source Software](https://arxiv.org/abs/2605.12738) by [Das](https://srdas.github.io/research.htm), Ieroshenko, Jain, Qiu, Chin, and Granger fits a [Bass diffusion model](https://en.wikipedia.org/wiki/Bass_diffusion_model) to the number of monthly developers contributing to a project, and then extrapolates the fitted equation into the future. Is the Bass model a good fit to this kind of data, and how reliable might its prediction be?\n\nWhat first caught my attention in this paper was the appearance of the `sech`\n\nfunction (i.e., the [hyperbolic secant](https://en.wikipedia.org/wiki/Hyperbolic_functions): ) in the derived formula. The only other place I have encountered this function in software engineering is the [Parr model](https://ntrs.nasa.gov/search.jsp?R=19870067647) of project staffing distribution, e.g., effort in hours per week. What’s more, both instances involve `sech`\n\nsquared, i.e., .\n\nIs this use of a coincidence, or is there an interesting connection? Let’s look at the paper.\n\nThe Bass diffusion model, or just Bass model, assumes that the [number of people buying a new product](https://scholar.archive.org/work/fesmag4wlzbi3jvnvaegeskdbi/access/wayback/http://stat.asu.edu/~dieter/courses/oldcourses/mat451/fall08/References/bass_1969.pdf) is controlled by two factors: 1) *independents* who have a constant probability, , of buying it, and 2) *imitators* whose probability of purchase depends on times the number of existing users of the product (see section 3.6.3 of [my book](http://knosof.co.uk/ESEUR/)). The Bass model has been extended to handle successive, overlapping generations of a product, e.g., [IBM mainframes](https://www.academia.edu/download/30911787/MahajanMuller1996.pdf).\n\nI have not seen the Bass model applied to software lifecycles before (a quick search found a [2014 paper](https://www.academia.edu/download/39251786/Remco_etal__2014.pdf) using it to model the time-evolution of package dependencies).\n\nThe authors of the new paper introduced `sech`\n\nby normalising two variables in the Bass equation:\n\nTime, is normalised by dividing by time of peak development, , and number of developers, , is normalised by dividing by peak number of developers, , giving:\n\n, where , and . It’s not possible to fit this equation to project data because the peak development values are not known (or might not yet have been reached).\n\nThe equation in the Parr model is: , where the values of and are obtained by fitting a regression model to project data. The derivation of the Parr model assumes that as project implementation progresses, new problems that need to be solved are discovered (e.g., features to be implemented), an existing problem can spawn at most two new subproblems, and the number of new problems discovered at any time is proportional to the number of remaining problems (cannot find an online version of “An alternative to the Rayleigh curve model for software development effort”).\n\nA connection can be made between the Bass and Parr models by equating the number of developers contributing with the number of problems to be solved, with contributors treated as *independents* or *imitators*. The opportunities for potential contributors are likely to increase as a project starts up and then, for some projects decrease (projects such as the Linux kernel just keep on going). The problem implemented by a developer could spawn more than two subproblems.\n\nIn practice most of the implementation work on an Open source project is done by a [small percentage of developers](https://dl.acm.org/doi/10.1145/2804360.2804366), with some projects dieing after [loosing a few core developers](https://arxiv.org/abs/2412.00313). There is also the issue of the same person [contributing using multiple identities](https://github.com/bvasiles/ght_unmasking_aliases).\n\nOne method for checking how well a model predicts future measurements is to compare the equations fitted using all the monthly data and, say, the first 75% months. The extent to which both fitted equations agree provides an indication of the likely accuracy of currently unknown future values. The Das et al paper fits the Bass model to the monthly contributor data from 23 projects. The plot below shows the number of monthly developers for numpy (since the project started), along with two fitted Bass models, one for all the data and the other for the first 75% of the data ([code+data](https://www.shape-of-code.com/code-data/2605.12738.tgz)):\n\nThe example project used in the paper has a closer agreement between the two fitted equations, and some of the other projects have much less agreement. The Bass models assumes that monthly contributions are primarily driven by two factors. In practice there could be many other factors driving developer involvement in a project.\n\n[Predicting when a project](https://shape-of-code.com/2024/09/29/modeling-program-loc-growth-with-recurrence-equations/) is likely to stop growing is a notoriously difficult problem. Fitting a [logistic equation](https://github.com/Derek-Jones/ESEUR-code-data/blob/master/regression/linux-logistic-fut.R) to the growth in lines of code is another example of a model fitting the pattern present in the underlying equation (which flattens off).\n\nIt’s possible that weighting developer contribution by the amount of functionality (not lines of code) will produce a closer agreement between theory and practice.\n\nThe Putnam project staffing model predates the Parr mode, and [later research](https://shape-of-code.com/2024/11/17/putnams-software-equation-debunked/) found that the Putnam model was also a poor predictor of project durations. Both the Parr and Putnam equations can be [derived using hazard analysis](https://www.sciencedirect.com/science/article/pii/S0167642302001065/pdf).\n\nApplying the Bass multi-product generation model to software evolution is now on my to-do list, e.g., [use of PHP versions](https://github.com/Derek-Jones/ESEUR-code-data/blob/master/ecosystems/1710-05570a.R).", "url": "https://wpnews.pro/news/estimating-open-source-project-lifecycle-using-the-bass-model", "canonical_source": "https://shape-of-code.com/2026/05/17/estimating-open-source-project-lifecycle-using-the-bass-model/", "published_at": "2026-05-17 22:02:24+00:00", "updated_at": "2026-06-16 07:25:22.117813+00:00", "lang": "en", "topics": ["machine-learning", "ai-research", "developer-tools"], "entities": ["Bass diffusion model", "Parr model", "Das", "Ieroshenko", "Jain", "Qiu", "Chin", "Granger"], "alternates": {"html": "https://wpnews.pro/news/estimating-open-source-project-lifecycle-using-the-bass-model", "markdown": "https://wpnews.pro/news/estimating-open-source-project-lifecycle-using-the-bass-model.md", "text": "https://wpnews.pro/news/estimating-open-source-project-lifecycle-using-the-bass-model.txt", "jsonld": "https://wpnews.pro/news/estimating-open-source-project-lifecycle-using-the-bass-model.jsonld"}}