Check out what happened in the world of science and international politics in Q1 2024!
Link
Knowledge article main photo
Breakthrough in Polish AI: PLLuM as a step towards the future

“This is a small step for man, but a giant leap for mankind” – the immortal words of Neil Armstrong, spoken during the historic moon landing, have become synonymous with breakthroughs in human history. The same words are reflected in the world of modern technology, especially when talking about the PLLuM (Polish Large Language Universal Model) initiative. With this ambitious project, we have a unique moment – it is both a great step for Polish scientists and a giant leap forward for Polish artificial intelligence as a whole.

The PLLuM (Polish Large Language Universal Model) project – symbolizing a big step for Polish scientists and a giant leap for Polish artificial intelligence – opens a new chapter in AI development in Poland, demonstrating the country’s potential and ambitions in the global innovation arena.

PLLuM is not only indicative of Poland’s technological progress, but also of the evolution of our society towards more advanced and universally accessible technology. This project, carried out by leading Polish scientific units, symbolizes a significant shift in the boundaries of possibilities in AI, establishing Poland as a key player in the global arena of innovation in artificial intelligence.

The creation of PLLuM opens a new chapter in AI

The launch of PLLuM, coinciding with the first anniversary of ChatGPT, represents a symbolic and significant moment for Polish artificial intelligence. This project not only heralds a new era in the field of AI in Poland, but also represents an important step towards the democratization of artificial intelligence technology.

PLLuM, being mostly trained on Polish-language content, offers a unique opportunity to develop and improve natural language processing technology in the context of the Polish language. This paves the way for the creation of more precise and effective language tools that can find application in many sectors, from education to business to public administration [1]. By developing AI focused on the Polish language, PLLuM contributes to filling the gap in available language tools, which are often limited to dominant languages like English.

Poland’s innovative strength – national institutions are writing a new page in the world of Artificial Intelligence

The cooperation of Poland’s leading scientific institutions for the PLLuM project is not only a testimony to the growing strength and importance of Polish science, but also a sign that Poland is actively part of the global trend of artificial intelligence development. The involvement of institutions such as Wrocław University of Technology, the National Research Institute NASK, OPI PIB, the Institute of Computer Science Basics of the Polish Academy of Sciences, the University of Łódź and the Institute of Slavic Studies of the Polish Academy of Sciences, is a significant statement of the strength and potential of the Polish scientific community in the field of AI.

PLLuM, being a response to the limitations of existing language models, such as closedness, high costs or insufficient share of Polish-language content, opens new perspectives. It is a significant step towards creating more inclusive and locally adapted technologies. The Polish initiative shows that we don’t have to rely solely on solutions created by global technology giants, but can shape the future of technology ourselves.

The PLLuM project has the potential to become a milestone in the history of Polish science and technology, showing the world that Poland is ready not only to follow global trends, but also to actively shape them and set new paths in the development of artificial intelligence.

The institutions in the consortium have many years of experience in preparing data for training and evaluating natural language processing models. Currently, most models are trained mainly on English language tasks. It turns out that achieving significant quality gains for known benchmarks is increasingly difficult. Still, there needs to be more diverse, multilingual data, especially training and test data for instructions and preference optimization in Polish. The PLLuM project will provide, in addition to the model trained in Polish, also data for fine-tuning language models on Polish tasks and training the model on Polish context and data.

– Jan Kocoń, PhD Eng., from the Department of Artificial Intelligence at Wrocław University of Science and Technology, Scientific Director of PLLuM.

The main difference between PLLuM and ChatGPT lies in their adaptation to specific languages. PLLuM is specifically adapted to the Polish language, which allows for a better understanding of typical Polish expressions, idioms and cultural and historical aspects of Poland [2].

This will also make it possible to fine-tune the best current open language models on these data and to obtain even higher quality multilingual models with improved Polish support. In the future, other institutions will use such open data to train models, and Polish will become one of the languages that count when building models. Also, a rich benchmark dataset added to other such multilingual collections will result in the fact that for different models to achieve better overall multilingual performance, they will also need good quality for Polish

– adds PhD Eng. Jan Kocoń.

Your contribution can change the future of AI

The creators of PLLuM are inviting a variety of contributors – from experts and researchers to institutions and AI enthusiasts. Opportunities for support range from sharing expertise to patronage to helping promote the project [3].

From the perspective of creating our model and dataset, the most important thing for us is that we will have control over the process from start to finish, making it easier to adapt the model to specific applications in the future. We have no guarantee that institutions like OpenAI or even open models like Mistral will be updated often enough to keep up with frequent changes in Polish legislation, for example. You can update your model as often as you like if you have it. Since the work is open, any Polish company or institution will be able to benefit from the experience and build their solutions on the shared results of the project.

– Jan Kocoń, PhD Eng., said.

Having control over one’s own language model and datasets therefore allows for rapid adaptation to specific, local changes, such as updates to Polish legislation. As a result, the model can be effectively adapted to the current needs and specifics of the language, avoiding the delays and limitations that can occur when using models developed by external institutions.

Having your own model, you can update it as often as you like. Since the work is open, any Polish company and institution will be able to take advantage of this experience and build their solutions on the shared results of the project

– the scientist stresses.

Particularly valuable seems to be the proposal to make data available for testing the model, which is crucial for its development and effectiveness. PLLuM is not just a technological project, but also a social initiative aimed at building a strong, cooperative community around the pioneering development of Polish artificial intelligence. 

Bibliography:

[1] NASK, We can’t afford to fall behind. The first open large language model (PLLUM) will be created, https://www.nask.pl/pl/aktualnosci/5314,Nie-stac-nas-na-to-by-zostawac-w-tyle-Powstanie-pierwszy-polski-otwarty-wielki-m.html [accessed 11.02.2024].

[2] NASK Science, Large Polish Language Model (PLLuM), https://science.nask.pl/research-areas/projects/7322 [accessed 11.02.2024].

Cover photography: Pixabay

Leave a comment