Python Script to Combine Parquet Files
Today’s blog post is quick and simple, a script I find myself using quite often these days. It takes a collection of CSV or Parquet files and combines them into a single file. I find it useful whenever I need to query the same data across multiple files. Beats importing the individual files into a DB client like DBeaver. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import os import pandas as pd # Directory containing your files dir_path = '/Users/yunier/Downloads' # Output file name output_file = os.path.join(dir_path, 'combined_output1.parquet') # List of files to combine (CSV and Parquet) files = [f for f in os.listdir(dir_path) if f.endswith('.csv') or f.endswith('.parquet')] # List to hold DataFrames dfs = [] for file in files: file_path = os.path.join(dir_path, file) if file.endswith('.csv'): df = pd.read_csv(file_path) elif file.endswith('.parquet'): df = pd.read_parquet(file_path) else: continue df['source_file'] = file # Track source of data. dfs.append(df) # Combine all DataFrames if dfs: combined_df = pd.concat(dfs, ignore_index=True) combined_df.to_parquet(output_file) print(f'Combined file saved to: {output_file}') else: print('No CSV or Parquet files found to combine.')
JSON:API Implementing Filtering
Introduction It has been over a year since I last wrote about JSON:API, since then the team behind JSON:API has published version 1.1 of the JSON:API specification. I would like to continue my journey of documenting JOSN:API in .NET by introducing a really cool feature to my Chinook JSON:API project, filtering. The first thing to know about filtering in JSON:API is that the spec itself is agnostic to any filtering strategies. Meaning it is up to you to define how filtering should be handled by your API. In my opinion, this has always been a drawback of the JSON:API spec, I believe in that it would have been a better choice for the spec if it had decided on a filtering strategy, but that is discussion for another day. While the spec does not favor any filtering strategy it does have some recommendations. ...
Circular Arrays
Most developers are familiar with the modulo opertor, it is often presented in an example that determines if a number is odd or even as shown in the code below taken from Testing whether a value is odd or even. 1 2 3 4 5 6 7 function isEven(n) { return n % 2 == 0; } function isOdd(n) { return Math.abs(n % 2) == 1; } Determining if a number is odd or even is just one use case for the modulo operator. Another use case that I learn a while back is that you can use the modulo operator to set a range, a boundary if you will, allowing you to rotate the array, this is because fundamentally A mod B is between 0 and B - 1 or another way to think about it, 0 <= A < B. ...
Go - Multiple Return Values
I’ve been spending the last few weeks learning Go by reading Learning Go by Jon Bodner, so far I’ve been enjoying learning about Go, though there is still one thing I keep tripping over, in Go, you can return one or more values, for me Go is the first language that I have worked with that does that, in every other language I had to introduce a custom discriminating union to achieve what Go does natively. ...
Consul Service Mesh in Kubernetes - Part 1
Introduction I have been spending my last few weeks sharpening up my Kubernetes skills, one area that I focused on was how to enable and use a Service Mesh in Kubernetes. A service mesh is a layer in your infrastructure that enables control of inbound and outboard traffic. It controls the traffic of any app or service that uses the network. Kubernetes offers a wide range of Service Meshes, in this blog post I am going to concentrate on HashiCorp’s service mesh offering, Consul, though you may see other refer to it as Consul Connect, Consul Connect is a set of features that were added to Consul was in 2018 to enable service mesh support. ...
Rule Engines In .NET
Introduction I am working on a project that requires the usage of rules engines to enforce business rules, I am unsure if I should roll out my own custom implementation, probably a bad idea, or if I should use an existing project. To help me make a decision I will need to look at the current options for rules engines available in .NET, I need to understand their capabilities and limitations. In .NET the most well-known rules engine is probably NRules, the project has been around for some years and has good documentation. I also know that Microsoft created its own rules engine, RulesEngine back in 2019. Then per awesome-dotnet I could use Plastic or Peasy.NET but I opted out on looking at those projects, for my use case I think NRules or RulesEngine will do. ...
Use Custom OpenAPI Specification File In .NET
.NET has the ability to auto-generated OpenAPI specifications based on your code. You can use different decorators on your code like ProducesResponseTypeAttribute or ConsumesAttribute to produce more descriptive response details for web API help pages generated by tools like Swagger. What if you didn’t want to use the autogenerated spec, what if you instead wanted to expose an already written spec, perhaps because you follow an API-first approach to building an API instead of a Code-First approach. How would you expose that OpenAPI file? ...
Power Up Integration Tests With Test Containers
Introduction In my blog post Integration Testing Using WebApplicationFactory I spoke about the benefits of testing a .NET Core Web API using WebApplicationFactory. The idea is that WebApplicationFactory creates a local HTTP server in-memory, meaning that when using WebApplicationFactory you are not mocking the HTTP request made to your API, you are actually using the API as if it were hosted in a live environment. The benefit here is that your test code seats in the middle of the Web API and the client code calling the API, meaning you can now test how the API behaves under certain requests from the client. One drawback of using WebApplicationFactory would be having to mock API dependencies, for example, the database. A common option for .NET developers using a relational database like SQL Server is to use SQLite in the integration tests, however, even that solution suffers from other drawbacks, our friend Jimmy Bogard goes into more detail in his blog Avoid In-Memory Databases for Tests. What if instead of faking the database we actually used a real live database in our integration tests? There is a way, how? Well, with Docker. ...
Tracking API Changes With Optic
Over the last few months, I have been brushing up on API testing, specifically around contract testing. According to Postman, an API contract is a human- and machine-readable representation of an API’s intended functionality. It establishes a single source of truth for what each request and response should look like—and forms the basis of service-level agreements (SLAs) between producers and consumers. API contract testing helps ensure that new releases don’t violate the contract by checking the content and format of requests and responses. ...
Stricter Types In TypeScript
Recently TypeScript wizard Matt Pocock made a Twitter thread on branded types. At first, I did not know what he was talking about, I thought it was a new TypeScript feature being introduced in TypeScript 5 but upon closer look, I realized that it was not a new feature but rather a technique that I already knew, opaque types. I first learned about opaque types from Evert Pot in his blog post Implementing an opaque type in typescript, though I guess now the TypeScript community prefers to call them branded types, the name doesn’t matter, the problem being solved is the same, preventing types from being interchangeable. ...